Received: by 2002:a05:6a10:9afc:0:0:0:0 with SMTP id t28csp1339156pxm; Sat, 26 Feb 2022 12:15:40 -0800 (PST) X-Google-Smtp-Source: ABdhPJzEvAcHydJTb2YMwQ+9LqDu+8o7P/aWatPX+x3JW7tyxGnzZyU8YKagWW1Db4sZg66cw7Fm X-Received: by 2002:a63:114c:0:b0:373:a701:3736 with SMTP id 12-20020a63114c000000b00373a7013736mr11468751pgr.117.1645906540807; Sat, 26 Feb 2022 12:15:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1645906540; cv=none; d=google.com; s=arc-20160816; b=Tp1HrxOHd2G9Z6b2aNUrecxJtfwDyM2Us0AH1JW2cFs8I1lz95XiDtWjRkejUKecqF AcUGZSL5Zz91zRFIfxnopukexlhBLc3zJ/AhdEfcVIZ8AiNJaIiZhDN+7nLiw3NNcriK QEVPnJKB3Xi9hKM8GNjDn667EU9jzaz0eeOR0gKgF69rjecvmNC2ZEIOFqp8s3FA5MDq Jjbn/IdsxBueOZgF8mgZLyeFuD+B2qiYMjYdcjc6JQnGD9+SO8RVSsIzbbakZy3b0FSh QndwE23+oFACUNHYbNWF37xO8yXxTXawpNM/zlIeJsPf5Zi0Ge9AyHMdYgQNzTjECMPV qoTg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=ROWC3oLZPFBkWKevWJFiGndZDNXs/VTAqJBNpj3go0k=; b=UwozGscDxeiN0EPLEG75AGFo8Y8EsbpHmXngCiATMCWm9mSuvra6BbRpkaq9uQkM9w sD7uaRk+RZNjSTCWr8IYYsLbnrUWiYquEjsgYbgILFc+qkiIbpSb7szHs+C6xbC8gHJG 0X5M3tCfvOmBNbyOyc3+355szKPhedGC4pEg4suAFDZj5eTkSsIb/4wOhYKXRRVnWHG7 XkqnbuiaPS1UWXu9vv3FxDBF9/ixD/eudzBolSFHL9qW9CO1zgi6ugSsTC5JZJM+sid8 YCWmhY89VTOkMjhXAUxb/KPZ0k32rnqZa5h7/nNW9lP8wnA3QgozkXqdTuJTZ3n6FxDR gc9g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=YMmDrf+S; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id q16-20020a63cc50000000b003755fb164a5si5339091pgi.611.2022.02.26.12.15.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 26 Feb 2022 12:15:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=YMmDrf+S; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 529E123EE5D; Sat, 26 Feb 2022 12:08:51 -0800 (PST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230402AbiBZSiA (ORCPT + 99 others); Sat, 26 Feb 2022 13:38:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46428 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231907AbiBZSh7 (ORCPT ); Sat, 26 Feb 2022 13:37:59 -0500 Received: from mail-il1-x135.google.com (mail-il1-x135.google.com [IPv6:2607:f8b0:4864:20::135]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 42E6D6251 for ; Sat, 26 Feb 2022 10:37:24 -0800 (PST) Received: by mail-il1-x135.google.com with SMTP id d3so6921538ilr.10 for ; Sat, 26 Feb 2022 10:37:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ROWC3oLZPFBkWKevWJFiGndZDNXs/VTAqJBNpj3go0k=; b=YMmDrf+S++1P8ZV9XlEvbqmoCHqESqlFBbbHCy+lZQ6c6i9MweDx14NCGwJAB1Ha66 bHLng7SwKvmSR6/gbDKVkKejBFguKWQGMMeBZKS/yEGM2ENDJoIZbPfNHXxoo/y2rMTB HBBtWeHmD2T5pXqJLrWStivG/mB8KzvIAF5DBuoWXbXiF2TQf5C6gUyZXMPR9HV8Xu5R vFQpXdbgZS4yHH07+uWGSlV8nKqeCzDu4ABE76vNGggrCd6xQlpk0Qys+cTNHiV6sB2a v60CP3kmLyJkiaDKMk5/rec911V6ajvxEzxA2lOLDxcTi1Fnwaf2VKu+mBIdO2HPyLFw XbJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ROWC3oLZPFBkWKevWJFiGndZDNXs/VTAqJBNpj3go0k=; b=GTw2s2X2bIZUryPsRkjJcaTsz60MSYzP5y6l7YnJWEILh75L4UQv7P4Gihuiv9wUBv D6v9XgD46RS68AR56uq/NB4sGvjZUBn9v2CTATAslk1GffAdu5wZSXWut0+1B6G73aEd c2a/F9UEf8WizWU22nhCymxnjmOBd1H1oh3XVVZidQ+QD9dgwsSMZ78Ro5lqWuc+Eed+ LGPkwwJSe3bL7MMCY0tTWrrxqIh8o9Sk5wcfKUIPsyxEIrRTv2X9d2e2Vcnr84oZhg9O Ksh4SKGYnT+bkhb/lm6NvqXaQqnpAFk/oCYBhhNMN2d1YrI4Pbx3qORtXAu2dsmThIiv xpEQ== X-Gm-Message-State: AOAM531Abfp6uhUsM7DTBXNc9Bszx7XPK7O6XfdNVxSRBbiApJpzIWwb ioCGVZP+Lv8uMxebXZjyFQ4U47lw/d792q7KLI4QQ3aBkrg= X-Received: by 2002:a05:6e02:214a:b0:2bf:a442:cbff with SMTP id d10-20020a056e02214a00b002bfa442cbffmr11968028ilv.107.1645900643582; Sat, 26 Feb 2022 10:37:23 -0800 (PST) MIME-Version: 1.0 References: <20220224161705.1041788-1-amir73il@gmail.com> In-Reply-To: From: Amir Goldstein Date: Sat, 26 Feb 2022 20:37:12 +0200 Message-ID: Subject: Re: [PATCH v2] nfsd: more robust allocation failure handling in nfsd_file_cache_init To: Chuck Lever III Cc: Jeff Layton , Linux NFS Mailing List Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Thu, Feb 24, 2022 at 11:39 PM Amir Goldstein wrote: > > On Thu, Feb 24, 2022 at 10:41 PM Chuck Lever III wrote: > > > > Hi Amir- > > > > > On Feb 24, 2022, at 11:17 AM, Amir Goldstein wrote: > > > > > > The nfsd file cache table can be pretty large and its allocation > > > may require as many as 80 contigious pages. > > > > > > Employ the same fix that was employed for similar issue that was > > > reported for the reply cache hash table allocation several years ago > > > by commit 8f97514b423a ("nfsd: more robust allocation failure handling > > > in nfsd_reply_cache_init"). > > > > > > Fixes: 65294c1f2c5e ("nfsd: add a new struct file caching facility to nfsd") > > > Link: https://lore.kernel.org/linux-nfs/e3cdaeec85a6cfec980e87fc294327c0381c1778.camel@kernel.org/ > > > Suggested-by: Jeff Layton > > > Signed-off-by: Amir Goldstein > > > --- > > > > > > Since v1: > > > - Use kvcalloc() > > > - Use kvfree() > > > > > > fs/nfsd/filecache.c | 6 +++--- > > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > v2 passes some simple testing, so I've applied it to NFSD for-next. > > It should get 0-day and merge testing and is available for others > > to try out. > > > > I don't have anything that exercises low memory scenarios, though. > > Do you have anything like this to try? > > Well, it is not low memory really it's fragmented memory. > I would try setting: > > CONFIG_FAIL_PAGE_ALLOC=y > > echo 5 > /sys/kernel/debug/fail_page_alloc/min-order > echo 100 > /sys/kernel/debug/fail_page_alloc/probability > > and starting (or restarting) nfsd. > hoping that other large page allocations won't get in the way. > > I gave it a shot, but couldn't figure out why nfsd4_files slab > is still there after stopping nfs-server service, meaning that > nfsd_file_cache_shutdown() was not called - I must be missing > something. I may play with this some more tomorrow. > Ok, I was missing some parameters. This configuration reproduces and failure and verified that the kvcalloc() fix solves the issue: $ systemctl stop nfs-server $ echo 5 > /sys/kernel/debug/fail_page_alloc/min-order $ echo 100 > /sys/kernel/debug/fail_page_alloc/probability $ echo 1 > /sys/kernel/debug/fail_page_alloc/times $ echo N > /sys/kernel/debug/fail_page_alloc/ignore-gfp-wait $ systemctl start nfs-server [ 24.410560] FAULT_INJECTION: forcing a failure. [ 24.410560] name fail_page_alloc, interval 1, probability 100, space 0, times 1 [ 24.413887] CPU: 1 PID: 1218 Comm: rpc.nfsd Not tainted 5.17.0-rc2-xfstests #5927 [ 24.415625] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 24.417098] Call Trace: [ 24.417098] [ 24.417098] dump_stack_lvl+0x45/0x59 [ 24.418999] should_fail+0x11a/0x13d [ 24.418999] prepare_alloc_pages.isra.0+0x97/0xc5 [ 24.418999] __alloc_pages+0x76/0x1c7 [ 24.418999] kmalloc_order+0x35/0xa7 [ 24.418999] kmalloc_order_trace+0x1b/0xf3 [ 24.418999] nfsd_file_cache_init+0x5b/0x2d8 [ 24.418999] nfsd_svc+0xcd/0x2b2 [ 24.427086] write_threads+0x6d/0xb5 [ 24.427086] ? get_int+0x70/0x70 [ 24.429020] nfsctl_transaction_write+0x4f/0x67 [ 24.429020] vfs_write+0xe3/0x14b [ 24.429020] ksys_write+0x7f/0xcb [ 24.429020] do_syscall_64+0x6d/0x80 [ 24.429020] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 24.429020] RIP: 0033:0x7f29d80d6504 [ 24.429020] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 48 8d 05 f9 61 0d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53 [ 24.439028] RSP: 002b:00007ffe867a47f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 24.439028] RAX: ffffffffffffffda RBX: 00007f29d8219560 RCX: 00007f29d80d6504 [ 24.442325] RDX: 0000000000000002 RSI: 00007f29d8219560 RDI: 0000000000000003 [ 24.442325] RBP: 0000000000000003 R08: 0000000000000000 R09: 00007ffe867a4557 [ 24.445644] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 24.445644] R13: 0000000000000008 R14: 0000000000000000 R15: 00007f29d83572a0 [ 24.449026] [ 24.450496] nfsd: unable to allocate nfsd_file_hashtbl Job for nfs-server.service canceled. With the fix patch, nfsd starts despite the injected failure. Thanks, Amir.