2022-02-24 16:58:02

by Amir Goldstein

[permalink] [raw]
Subject: [PATCH v2] nfsd: more robust allocation failure handling in nfsd_file_cache_init

The nfsd file cache table can be pretty large and its allocation
may require as many as 80 contigious pages.

Employ the same fix that was employed for similar issue that was
reported for the reply cache hash table allocation several years ago
by commit 8f97514b423a ("nfsd: more robust allocation failure handling
in nfsd_reply_cache_init").

Fixes: 65294c1f2c5e ("nfsd: add a new struct file caching facility to nfsd")
Link: https://lore.kernel.org/linux-nfs/[email protected]/
Suggested-by: Jeff Layton <[email protected]>
Signed-off-by: Amir Goldstein <[email protected]>
---

Since v1:
- Use kvcalloc()
- Use kvfree()

fs/nfsd/filecache.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 8bc807c5fea4..cc2831cec669 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -632,7 +632,7 @@ nfsd_file_cache_init(void)
if (!nfsd_filecache_wq)
goto out;

- nfsd_file_hashtbl = kcalloc(NFSD_FILE_HASH_SIZE,
+ nfsd_file_hashtbl = kvcalloc(NFSD_FILE_HASH_SIZE,
sizeof(*nfsd_file_hashtbl), GFP_KERNEL);
if (!nfsd_file_hashtbl) {
pr_err("nfsd: unable to allocate nfsd_file_hashtbl\n");
@@ -700,7 +700,7 @@ nfsd_file_cache_init(void)
nfsd_file_slab = NULL;
kmem_cache_destroy(nfsd_file_mark_slab);
nfsd_file_mark_slab = NULL;
- kfree(nfsd_file_hashtbl);
+ kvfree(nfsd_file_hashtbl);
nfsd_file_hashtbl = NULL;
destroy_workqueue(nfsd_filecache_wq);
nfsd_filecache_wq = NULL;
@@ -811,7 +811,7 @@ nfsd_file_cache_shutdown(void)
fsnotify_wait_marks_destroyed();
kmem_cache_destroy(nfsd_file_mark_slab);
nfsd_file_mark_slab = NULL;
- kfree(nfsd_file_hashtbl);
+ kvfree(nfsd_file_hashtbl);
nfsd_file_hashtbl = NULL;
destroy_workqueue(nfsd_filecache_wq);
nfsd_filecache_wq = NULL;
--
2.25.1


2022-02-24 21:30:33

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v2] nfsd: more robust allocation failure handling in nfsd_file_cache_init

Hi Amir-

> On Feb 24, 2022, at 11:17 AM, Amir Goldstein <[email protected]> wrote:
>
> The nfsd file cache table can be pretty large and its allocation
> may require as many as 80 contigious pages.
>
> Employ the same fix that was employed for similar issue that was
> reported for the reply cache hash table allocation several years ago
> by commit 8f97514b423a ("nfsd: more robust allocation failure handling
> in nfsd_reply_cache_init").
>
> Fixes: 65294c1f2c5e ("nfsd: add a new struct file caching facility to nfsd")
> Link: https://lore.kernel.org/linux-nfs/[email protected]/
> Suggested-by: Jeff Layton <[email protected]>
> Signed-off-by: Amir Goldstein <[email protected]>
> ---
>
> Since v1:
> - Use kvcalloc()
> - Use kvfree()
>
> fs/nfsd/filecache.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)

v2 passes some simple testing, so I've applied it to NFSD for-next.
It should get 0-day and merge testing and is available for others
to try out.

I don't have anything that exercises low memory scenarios, though.
Do you have anything like this to try?


> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 8bc807c5fea4..cc2831cec669 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -632,7 +632,7 @@ nfsd_file_cache_init(void)
> if (!nfsd_filecache_wq)
> goto out;
>
> - nfsd_file_hashtbl = kcalloc(NFSD_FILE_HASH_SIZE,
> + nfsd_file_hashtbl = kvcalloc(NFSD_FILE_HASH_SIZE,
> sizeof(*nfsd_file_hashtbl), GFP_KERNEL);
> if (!nfsd_file_hashtbl) {
> pr_err("nfsd: unable to allocate nfsd_file_hashtbl\n");
> @@ -700,7 +700,7 @@ nfsd_file_cache_init(void)
> nfsd_file_slab = NULL;
> kmem_cache_destroy(nfsd_file_mark_slab);
> nfsd_file_mark_slab = NULL;
> - kfree(nfsd_file_hashtbl);
> + kvfree(nfsd_file_hashtbl);
> nfsd_file_hashtbl = NULL;
> destroy_workqueue(nfsd_filecache_wq);
> nfsd_filecache_wq = NULL;
> @@ -811,7 +811,7 @@ nfsd_file_cache_shutdown(void)
> fsnotify_wait_marks_destroyed();
> kmem_cache_destroy(nfsd_file_mark_slab);
> nfsd_file_mark_slab = NULL;
> - kfree(nfsd_file_hashtbl);
> + kvfree(nfsd_file_hashtbl);
> nfsd_file_hashtbl = NULL;
> destroy_workqueue(nfsd_filecache_wq);
> nfsd_filecache_wq = NULL;
> --
> 2.25.1
>

--
Chuck Lever



2022-02-24 23:08:31

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v2] nfsd: more robust allocation failure handling in nfsd_file_cache_init



> On Feb 24, 2022, at 4:49 PM, NeilBrown <[email protected]> wrote:
>
> On Fri, 25 Feb 2022, Amir Goldstein wrote:
>> The nfsd file cache table can be pretty large and its allocation
>> may require as many as 80 contigious pages.
>>
>> Employ the same fix that was employed for similar issue that was
>> reported for the reply cache hash table allocation several years ago
>> by commit 8f97514b423a ("nfsd: more robust allocation failure handling
>> in nfsd_reply_cache_init").
>>
>> Fixes: 65294c1f2c5e ("nfsd: add a new struct file caching facility to nfsd")
>> Link: https://lore.kernel.org/linux-nfs/[email protected]/
>> Suggested-by: Jeff Layton <[email protected]>
>> Signed-off-by: Amir Goldstein <[email protected]>
>> ---
>>
>> Since v1:
>> - Use kvcalloc()
>> - Use kvfree()
>
> I think this is a good improvement, but it would be really nice to
> replace this bespoke hash table with an rhashtable. They we wouldn't
> need to worry about these trivial details.

I agree -- I didn't want to saddle Jeff or Amir with pulling
on that chain. But I'm willing to review patches that attempt
that kind of replacement (same for the DRC hash table).


--
Chuck Lever



2022-02-25 00:20:36

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH v2] nfsd: more robust allocation failure handling in nfsd_file_cache_init

On Fri, 25 Feb 2022, Amir Goldstein wrote:
> The nfsd file cache table can be pretty large and its allocation
> may require as many as 80 contigious pages.
>
> Employ the same fix that was employed for similar issue that was
> reported for the reply cache hash table allocation several years ago
> by commit 8f97514b423a ("nfsd: more robust allocation failure handling
> in nfsd_reply_cache_init").
>
> Fixes: 65294c1f2c5e ("nfsd: add a new struct file caching facility to nfsd")
> Link: https://lore.kernel.org/linux-nfs/[email protected]/
> Suggested-by: Jeff Layton <[email protected]>
> Signed-off-by: Amir Goldstein <[email protected]>
> ---
>
> Since v1:
> - Use kvcalloc()
> - Use kvfree()

I think this is a good improvement, but it would be really nice to
replace this bespoke hash table with an rhashtable. They we wouldn't
need to worry about these trivial details.

NeilBrown

2022-02-25 01:38:11

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH v2] nfsd: more robust allocation failure handling in nfsd_file_cache_init

On Thu, Feb 24, 2022 at 10:41 PM Chuck Lever III <[email protected]> wrote:
>
> Hi Amir-
>
> > On Feb 24, 2022, at 11:17 AM, Amir Goldstein <[email protected]> wrote:
> >
> > The nfsd file cache table can be pretty large and its allocation
> > may require as many as 80 contigious pages.
> >
> > Employ the same fix that was employed for similar issue that was
> > reported for the reply cache hash table allocation several years ago
> > by commit 8f97514b423a ("nfsd: more robust allocation failure handling
> > in nfsd_reply_cache_init").
> >
> > Fixes: 65294c1f2c5e ("nfsd: add a new struct file caching facility to nfsd")
> > Link: https://lore.kernel.org/linux-nfs/[email protected]/
> > Suggested-by: Jeff Layton <[email protected]>
> > Signed-off-by: Amir Goldstein <[email protected]>
> > ---
> >
> > Since v1:
> > - Use kvcalloc()
> > - Use kvfree()
> >
> > fs/nfsd/filecache.c | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
>
> v2 passes some simple testing, so I've applied it to NFSD for-next.
> It should get 0-day and merge testing and is available for others
> to try out.
>
> I don't have anything that exercises low memory scenarios, though.
> Do you have anything like this to try?

Well, it is not low memory really it's fragmented memory.
I would try setting:

CONFIG_FAIL_PAGE_ALLOC=y

echo 5 > /sys/kernel/debug/fail_page_alloc/min-order
echo 100 > /sys/kernel/debug/fail_page_alloc/probability

and starting (or restarting) nfsd.
hoping that other large page allocations won't get in the way.

I gave it a shot, but couldn't figure out why nfsd4_files slab
is still there after stopping nfs-server service, meaning that
nfsd_file_cache_shutdown() was not called - I must be missing
something. I may play with this some more tomorrow.

Thanks,
Amir.

2022-02-26 20:15:40

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH v2] nfsd: more robust allocation failure handling in nfsd_file_cache_init

On Thu, Feb 24, 2022 at 11:39 PM Amir Goldstein <[email protected]> wrote:
>
> On Thu, Feb 24, 2022 at 10:41 PM Chuck Lever III <[email protected]> wrote:
> >
> > Hi Amir-
> >
> > > On Feb 24, 2022, at 11:17 AM, Amir Goldstein <[email protected]> wrote:
> > >
> > > The nfsd file cache table can be pretty large and its allocation
> > > may require as many as 80 contigious pages.
> > >
> > > Employ the same fix that was employed for similar issue that was
> > > reported for the reply cache hash table allocation several years ago
> > > by commit 8f97514b423a ("nfsd: more robust allocation failure handling
> > > in nfsd_reply_cache_init").
> > >
> > > Fixes: 65294c1f2c5e ("nfsd: add a new struct file caching facility to nfsd")
> > > Link: https://lore.kernel.org/linux-nfs/[email protected]/
> > > Suggested-by: Jeff Layton <[email protected]>
> > > Signed-off-by: Amir Goldstein <[email protected]>
> > > ---
> > >
> > > Since v1:
> > > - Use kvcalloc()
> > > - Use kvfree()
> > >
> > > fs/nfsd/filecache.c | 6 +++---
> > > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > v2 passes some simple testing, so I've applied it to NFSD for-next.
> > It should get 0-day and merge testing and is available for others
> > to try out.
> >
> > I don't have anything that exercises low memory scenarios, though.
> > Do you have anything like this to try?
>
> Well, it is not low memory really it's fragmented memory.
> I would try setting:
>
> CONFIG_FAIL_PAGE_ALLOC=y
>
> echo 5 > /sys/kernel/debug/fail_page_alloc/min-order
> echo 100 > /sys/kernel/debug/fail_page_alloc/probability
>
> and starting (or restarting) nfsd.
> hoping that other large page allocations won't get in the way.
>
> I gave it a shot, but couldn't figure out why nfsd4_files slab
> is still there after stopping nfs-server service, meaning that
> nfsd_file_cache_shutdown() was not called - I must be missing
> something. I may play with this some more tomorrow.
>

Ok, I was missing some parameters.
This configuration reproduces and failure and verified that the
kvcalloc() fix solves the issue:

$ systemctl stop nfs-server
$ echo 5 > /sys/kernel/debug/fail_page_alloc/min-order
$ echo 100 > /sys/kernel/debug/fail_page_alloc/probability
$ echo 1 > /sys/kernel/debug/fail_page_alloc/times
$ echo N > /sys/kernel/debug/fail_page_alloc/ignore-gfp-wait
$ systemctl start nfs-server

[ 24.410560] FAULT_INJECTION: forcing a failure.
[ 24.410560] name fail_page_alloc, interval 1, probability 100,
space 0, times 1
[ 24.413887] CPU: 1 PID: 1218 Comm: rpc.nfsd Not tainted
5.17.0-rc2-xfstests #5927
[ 24.415625] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.13.0-1ubuntu1.1 04/01/2014
[ 24.417098] Call Trace:
[ 24.417098] <TASK>
[ 24.417098] dump_stack_lvl+0x45/0x59
[ 24.418999] should_fail+0x11a/0x13d
[ 24.418999] prepare_alloc_pages.isra.0+0x97/0xc5
[ 24.418999] __alloc_pages+0x76/0x1c7
[ 24.418999] kmalloc_order+0x35/0xa7
[ 24.418999] kmalloc_order_trace+0x1b/0xf3
[ 24.418999] nfsd_file_cache_init+0x5b/0x2d8
[ 24.418999] nfsd_svc+0xcd/0x2b2
[ 24.427086] write_threads+0x6d/0xb5
[ 24.427086] ? get_int+0x70/0x70
[ 24.429020] nfsctl_transaction_write+0x4f/0x67
[ 24.429020] vfs_write+0xe3/0x14b
[ 24.429020] ksys_write+0x7f/0xcb
[ 24.429020] do_syscall_64+0x6d/0x80
[ 24.429020] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 24.429020] RIP: 0033:0x7f29d80d6504
[ 24.429020] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f
1f 80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00
00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89
f5 53
[ 24.439028] RSP: 002b:00007ffe867a47f8 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[ 24.439028] RAX: ffffffffffffffda RBX: 00007f29d8219560 RCX: 00007f29d80d6504
[ 24.442325] RDX: 0000000000000002 RSI: 00007f29d8219560 RDI: 0000000000000003
[ 24.442325] RBP: 0000000000000003 R08: 0000000000000000 R09: 00007ffe867a4557
[ 24.445644] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 24.445644] R13: 0000000000000008 R14: 0000000000000000 R15: 00007f29d83572a0
[ 24.449026] </TASK>
[ 24.450496] nfsd: unable to allocate nfsd_file_hashtbl
Job for nfs-server.service canceled.

With the fix patch, nfsd starts despite the injected failure.

Thanks,
Amir.

2022-02-26 20:17:54

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH v2] nfsd: more robust allocation failure handling in nfsd_file_cache_init



> On Feb 26, 2022, at 1:37 PM, Amir Goldstein <[email protected]> wrote:
>
> On Thu, Feb 24, 2022 at 11:39 PM Amir Goldstein <[email protected]> wrote:
>>
>> On Thu, Feb 24, 2022 at 10:41 PM Chuck Lever III <[email protected]> wrote:
>>>
>>> Hi Amir-
>>>
>>>> On Feb 24, 2022, at 11:17 AM, Amir Goldstein <[email protected]> wrote:
>>>>
>>>> The nfsd file cache table can be pretty large and its allocation
>>>> may require as many as 80 contigious pages.
>>>>
>>>> Employ the same fix that was employed for similar issue that was
>>>> reported for the reply cache hash table allocation several years ago
>>>> by commit 8f97514b423a ("nfsd: more robust allocation failure handling
>>>> in nfsd_reply_cache_init").
>>>>
>>>> Fixes: 65294c1f2c5e ("nfsd: add a new struct file caching facility to nfsd")
>>>> Link: https://lore.kernel.org/linux-nfs/[email protected]/
>>>> Suggested-by: Jeff Layton <[email protected]>
>>>> Signed-off-by: Amir Goldstein <[email protected]>
>>>> ---
>>>>
>>>> Since v1:
>>>> - Use kvcalloc()
>>>> - Use kvfree()
>>>>
>>>> fs/nfsd/filecache.c | 6 +++---
>>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> v2 passes some simple testing, so I've applied it to NFSD for-next.
>>> It should get 0-day and merge testing and is available for others
>>> to try out.
>>>
>>> I don't have anything that exercises low memory scenarios, though.
>>> Do you have anything like this to try?
>>
>> Well, it is not low memory really it's fragmented memory.
>> I would try setting:
>>
>> CONFIG_FAIL_PAGE_ALLOC=y
>>
>> echo 5 > /sys/kernel/debug/fail_page_alloc/min-order
>> echo 100 > /sys/kernel/debug/fail_page_alloc/probability
>>
>> and starting (or restarting) nfsd.
>> hoping that other large page allocations won't get in the way.
>>
>> I gave it a shot, but couldn't figure out why nfsd4_files slab
>> is still there after stopping nfs-server service, meaning that
>> nfsd_file_cache_shutdown() was not called - I must be missing
>> something. I may play with this some more tomorrow.
>>
>
> Ok, I was missing some parameters.
> This configuration reproduces and failure and verified that the
> kvcalloc() fix solves the issue:
>
> $ systemctl stop nfs-server
> $ echo 5 > /sys/kernel/debug/fail_page_alloc/min-order
> $ echo 100 > /sys/kernel/debug/fail_page_alloc/probability
> $ echo 1 > /sys/kernel/debug/fail_page_alloc/times
> $ echo N > /sys/kernel/debug/fail_page_alloc/ignore-gfp-wait
> $ systemctl start nfs-server
>
> [ 24.410560] FAULT_INJECTION: forcing a failure.
> [ 24.410560] name fail_page_alloc, interval 1, probability 100,
> space 0, times 1
> [ 24.413887] CPU: 1 PID: 1218 Comm: rpc.nfsd Not tainted
> 5.17.0-rc2-xfstests #5927
> [ 24.415625] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.13.0-1ubuntu1.1 04/01/2014
> [ 24.417098] Call Trace:
> [ 24.417098] <TASK>
> [ 24.417098] dump_stack_lvl+0x45/0x59
> [ 24.418999] should_fail+0x11a/0x13d
> [ 24.418999] prepare_alloc_pages.isra.0+0x97/0xc5
> [ 24.418999] __alloc_pages+0x76/0x1c7
> [ 24.418999] kmalloc_order+0x35/0xa7
> [ 24.418999] kmalloc_order_trace+0x1b/0xf3
> [ 24.418999] nfsd_file_cache_init+0x5b/0x2d8
> [ 24.418999] nfsd_svc+0xcd/0x2b2
> [ 24.427086] write_threads+0x6d/0xb5
> [ 24.427086] ? get_int+0x70/0x70
> [ 24.429020] nfsctl_transaction_write+0x4f/0x67
> [ 24.429020] vfs_write+0xe3/0x14b
> [ 24.429020] ksys_write+0x7f/0xcb
> [ 24.429020] do_syscall_64+0x6d/0x80
> [ 24.429020] entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 24.429020] RIP: 0033:0x7f29d80d6504
> [ 24.429020] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f
> 1f 80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00
> 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89
> f5 53
> [ 24.439028] RSP: 002b:00007ffe867a47f8 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000001
> [ 24.439028] RAX: ffffffffffffffda RBX: 00007f29d8219560 RCX: 00007f29d80d6504
> [ 24.442325] RDX: 0000000000000002 RSI: 00007f29d8219560 RDI: 0000000000000003
> [ 24.442325] RBP: 0000000000000003 R08: 0000000000000000 R09: 00007ffe867a4557
> [ 24.445644] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> [ 24.445644] R13: 0000000000000008 R14: 0000000000000000 R15: 00007f29d83572a0
> [ 24.449026] </TASK>
> [ 24.450496] nfsd: unable to allocate nfsd_file_hashtbl
> Job for nfs-server.service canceled.
>
> With the fix patch, nfsd starts despite the injected failure.

Thanks. I will add your Tested-by: .


--
Chuck Lever