Dave Jones reported the following
This made it into 5.13 final, and completely breaks NFSD for me
(Serving tcp v3 mounts). Existing mounts on clients hang, as do
new mounts from new clients. Rebooting the server back to rc7
everything recovers.
The commit b3b64ebd3822 ("mm/page_alloc: do bulk array bounds check after
checking populated elements") returns the wrong value if the array is
already populated which is interpreted as an allocation failure. Dave
reported this fixes his problem and it also passed a test running dbench
over NFS.
Fixes: b3b64ebd3822 ("mm/page_alloc: do bulk array bounds check after checking populated elements")
Reported-and-tested-by: Dave Jones <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Cc: <[email protected]> [5.13+]
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ef2265f86b91..04220581579c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5058,7 +5058,7 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
/* Already populated array? */
if (unlikely(page_array && nr_pages - nr_populated == 0))
- return 0;
+ return nr_populated;
/* Use the single page allocator for one page. */
if (nr_pages - nr_populated == 1)
Hi Mel,
On Mon, Jun 28, 2021 at 5:29 PM Mel Gorman <[email protected]> wrote:
> Dave Jones reported the following
>
> This made it into 5.13 final, and completely breaks NFSD for me
> (Serving tcp v3 mounts). Existing mounts on clients hang, as do
> new mounts from new clients. Rebooting the server back to rc7
> everything recovers.
>
> The commit b3b64ebd3822 ("mm/page_alloc: do bulk array bounds check after
> checking populated elements") returns the wrong value if the array is
> already populated which is interpreted as an allocation failure. Dave
> reported this fixes his problem and it also passed a test running dbench
> over NFS.
>
> Fixes: b3b64ebd3822 ("mm/page_alloc: do bulk array bounds check after checking populated elements")
> Reported-and-tested-by: Dave Jones <[email protected]>
> Signed-off-by: Mel Gorman <[email protected]>
> Cc: <[email protected]> [5.13+]
I saw similar failures as Mike Galbraith when doing s2idle or s2ram
on some boards with some configs:
Freezing of tasks failed after 20.004 seconds (1 tasks refusing to
freeze, wq_busy=0):
task:NFSv4 callback state:S stack: 0 pid: 280 ppid: 2
flags:0x00000000
[<c094b634>] (__schedule) from [<c094b8d0>] (schedule+0xc0/0x110)
[<c094b8d0>] (schedule) from [<c094faec>] (schedule_timeout+0xc8/0x108)
[<c094faec>] (schedule_timeout) from [<c092e0a0>] (svc_recv+0x108/0xa30)
[<c092e0a0>] (svc_recv) from [<c04c5990>] (nfs4_callback_svc+0x6c/0x84)
[<c04c5990>] (nfs4_callback_svc) from [<c0244ddc>] (kthread+0x128/0x138)
[<c0244ddc>] (kthread) from [<c0200114>] (ret_from_fork+0x14/0x20)
I've bisected it (twice, as I couldn't believe the result) to the
same commit, which helped me find the fix.
After cherry-picking commit 66d9282523b32281 ("mm/page_alloc: Correct
return value of populated elements if bulk array is populated"),
the problem went away.
Tested-by: Geert Uytterhoeven <[email protected]>
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds