Hello Chuck,
I believe commit 5f7fc5d "SUNRPC: Resupply rq_pages from node-local memory" in
Linux 6.5+ is incorrect. It passes unconditionnaly rq_pool->sp_id as the NUMA
node.
While the comment in the svc_pool declaration in sunrpc/svc.h says that
sp_id is also the NUMA node id, it might not be the case if the svc is
created using svc_create_pooled(). svc_created_pooled() can use the
per-cpu pool mode therefore in this case sp_id would be the cpu id.
from __svc_create:
for (i = 0; i < serv->sv_nrpools; i++) {
struct svc_pool *pool = &serv->sv_pools[i];
dprintk("svc: initialising pool %u for %s\n",
i, serv->sv_name);
pool->sp_id = i;
When using the cpu-mode, this triggers a BUG on my machine:
BUG: unable to handle page fault for address: 0000000000002088
#7 [ffffafa3dc42fc90] asm_exc_page_fault at ffffffffa3e00bc7
[exception RIP: __next_zones_zonelist+9]
RIP: ffffffffa32fbbc9 RSP: ffffafa3dc42fd48 RFLAGS: 00010286
RAX: 0000000000002080 RBX: 0000000000000000 RCX: ffff8ba5f22bafc0
RDX: ffff8ba5f22bafc0 RSI: 0000000000000002 RDI: 0000000000002080
RBP: ffffafa3dc42fdc0 R8: 0000000000002080 R9: ffff8ba62138c2d8
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000cc0
R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000001
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffffafa3dc42fd50] __alloc_pages at ffffffffa334c122
#9 [ffffafa3dc42fdc8] __alloc_pages_bulk at ffffffffa334c519
#10 [ffffafa3dc42fe58] svc_alloc_arg at ffffffffc0afc0d7 [sunrpc]
#11 [ffffafa3dc42fea0] svc_recv at ffffffffc0afe08d [sunrpc]
#12 [ffffafa3dc42fec8] nfsd at ffffffffc0dec469 [nfsd]
#13 [ffffafa3dc42fee8] kthread at ffffffffa30e4826
I believe the fix is to expose svc_pool_map_get_node() and use that in
the alloc_pages_bulk_array_node() call in svx_xprt.c. Reverting 5f7fc5d
would obviously work as well.
The comment in svc.h should probably be updated as well since it's misleading.
I didn't provide a patch because I wasn't quite sure which approach you would
prefer but could provide one if that's helpful.
HTH
Guillaume.
--
Guillaume Morin <[email protected]>
On Mon, Dec 18, 2023 at 10:46:22PM +0100, Guillaume Morin wrote:
> Hello Chuck,
>
> I believe commit 5f7fc5d "SUNRPC: Resupply rq_pages from node-local memory" in
> Linux 6.5+ is incorrect. It passes unconditionnaly rq_pool->sp_id as the NUMA
> node.
>
> While the comment in the svc_pool declaration in sunrpc/svc.h says that
> sp_id is also the NUMA node id, it might not be the case if the svc is
> created using svc_create_pooled(). svc_created_pooled() can use the
> per-cpu pool mode therefore in this case sp_id would be the cpu id.
>
> from __svc_create:
> for (i = 0; i < serv->sv_nrpools; i++) {
> struct svc_pool *pool = &serv->sv_pools[i];
>
> dprintk("svc: initialising pool %u for %s\n",
> i, serv->sv_name);
>
> pool->sp_id = i;
>
> When using the cpu-mode, this triggers a BUG on my machine:
> BUG: unable to handle page fault for address: 0000000000002088
>
> #7 [ffffafa3dc42fc90] asm_exc_page_fault at ffffffffa3e00bc7
> [exception RIP: __next_zones_zonelist+9]
> RIP: ffffffffa32fbbc9 RSP: ffffafa3dc42fd48 RFLAGS: 00010286
> RAX: 0000000000002080 RBX: 0000000000000000 RCX: ffff8ba5f22bafc0
> RDX: ffff8ba5f22bafc0 RSI: 0000000000000002 RDI: 0000000000002080
> RBP: ffffafa3dc42fdc0 R8: 0000000000002080 R9: ffff8ba62138c2d8
> R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000cc0
> R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000001
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> #8 [ffffafa3dc42fd50] __alloc_pages at ffffffffa334c122
> #9 [ffffafa3dc42fdc8] __alloc_pages_bulk at ffffffffa334c519
> #10 [ffffafa3dc42fe58] svc_alloc_arg at ffffffffc0afc0d7 [sunrpc]
> #11 [ffffafa3dc42fea0] svc_recv at ffffffffc0afe08d [sunrpc]
> #12 [ffffafa3dc42fec8] nfsd at ffffffffc0dec469 [nfsd]
> #13 [ffffafa3dc42fee8] kthread at ffffffffa30e4826
>
> I believe the fix is to expose svc_pool_map_get_node() and use that in
> the alloc_pages_bulk_array_node() call in svx_xprt.c. Reverting 5f7fc5d
> would obviously work as well.
>
> The comment in svc.h should probably be updated as well since it's misleading.
>
> I didn't provide a patch because I wasn't quite sure which approach you would
> prefer but could provide one if that's helpful.
Reverted and applied for v6.7-rc (see my nfsd-fixes branch). Thanks
for the report and analysis!
--
Chuck Lever