Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp2318113rdh; Tue, 26 Sep 2023 21:31:57 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF+EUkqOlML4QFEenTgFv8NVp0adbvOM3yDA1+V6u81ptfB3MjFlxABCuuFKahPZf+RguSY X-Received: by 2002:a05:6870:9124:b0:1c8:b870:4e62 with SMTP id o36-20020a056870912400b001c8b8704e62mr1255704oae.52.1695789117011; Tue, 26 Sep 2023 21:31:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695789116; cv=none; d=google.com; s=arc-20160816; b=YO3xj4vTqok6vPu/lgSDdzcwYKTbVL6a/S1NcGnANxfsM1Td82k4TchpS8jrl2FeWz FsLwKXiLsoelYWv+CPhEoiDBJ0T/ih8g4a3UA1UyHGm/IFh80rpuyS/XBVq7bTw0r3ID eL8P0zgjw7oM3Mrfg0Br+BOZAaNcJJgJX8GbjLPC1znwx5CGmnJUagbcQdHAbPB00wNK G+P82UB3uMZO7oxudCk0NvXM+b4bXxW2ZXEd4dQwcZTujor866EbAD3mYvazAGVDIiFX VMfgc9wscGLRmXmWN9oNP1MskNiclPW7QaagfOY7kRI/vDPvE1+Tb3/3qvTUAoLImbWO 9vzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:references:in-reply-to:subject :cc:to:from:mime-version:content-transfer-encoding:dkim-signature :dkim-signature; bh=NfaofOtB+gtfwBMprOVjmx8+00k5IHrT24uBlaZdWLw=; fh=OQ3dddYJgBLgKxzBPvoIfk6H60cp3nDWIfMnXefASfM=; b=yX4YMKtZAIkAhgDwdXGyfFK45JVJpvHh5H5qQtUzIUwl6rQKMBWeYRLB1mMR8veqdj W9n2sQPd9EnlZFoZFL5PUPn5ds78Yy8mTcMULMM8L0knyAAURY1zqkqhJGI1yPfP/Ujz IFGEY5FX6d8oAdDsjAJezEGoLfQ9hDiauRgZVwVYiVL2Tz8hSf/bpUm88VAkkTIqHzrz Bg3uib5DRn3RYo3bT6tiQ9gIaXrtdUlo49bxjX7xQAO2LOoO1f82TVs3b5qVgQ0Gmuk2 pUMSj5o2Ux4hB5hX1ke+H09yMz8G75DAt77nJ0sUFtpGZfy05P9KSfkKUz9jnECqW7Te sNYw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=0e4KgD1K; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Return-Path: Received: from morse.vger.email (morse.vger.email. [23.128.96.31]) by mx.google.com with ESMTPS id 132-20020a63018a000000b0055fce913d52si13904245pgb.761.2023.09.26.21.31.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Sep 2023 21:31:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) client-ip=23.128.96.31; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=0e4KgD1K; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id B2B088241E34; Mon, 25 Sep 2023 16:04:18 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229585AbjIYXEW (ORCPT + 99 others); Mon, 25 Sep 2023 19:04:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56902 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231922AbjIYXEW (ORCPT ); Mon, 25 Sep 2023 19:04:22 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 36BF5101 for ; Mon, 25 Sep 2023 16:04:11 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id EAB8621838; Mon, 25 Sep 2023 23:04:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1695683049; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NfaofOtB+gtfwBMprOVjmx8+00k5IHrT24uBlaZdWLw=; b=0e4KgD1KWepNTabSIDyyGGtRQYJSHq4qQxEJESD78Ok+dYTc2I9T/tspuU6wup6qEUTq0O HsPpSbXPpj+z5GmwHnu+hpqPNFcsNPusNIe9UqvlbZBjea2ISyw0bR+GGoXswQGE/ACfDx +AHFQvvpBbU12GeuL1URENHtpb+T/cg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1695683049; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NfaofOtB+gtfwBMprOVjmx8+00k5IHrT24uBlaZdWLw=; b=7VnOsKr7SkBXpv9d5qXi0nhOxHFP/l0TDnP05WXvWP6hcxhtdQwEPxbdHb+3x1jddeHmgJ IYkSTyHeRLNHt2Aw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 306FF1358F; Mon, 25 Sep 2023 23:04:07 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id DOBvNucREmXgIgAAMHmgww (envelope-from ); Mon, 25 Sep 2023 23:04:07 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit MIME-Version: 1.0 From: "NeilBrown" To: "Trond Myklebust" Cc: "linux-nfs@vger.kernel.org" , "aglo@umich.edu" , "Anna.Schumaker@netapp.com" , "schumaker.anna@gmail.com" Subject: Re: [PATCH 2/2] NFSv4: Fix a state manager thread deadlock regression In-reply-to: References: <20230917230551.30483-1-trondmy@kernel.org>, <20230917230551.30483-2-trondmy@kernel.org>, , <9eda74d7438ee0a82323058b9d4c2b98f4e434cf.camel@hammerspace.com>, , <077cb75b44afd2404629c1388a92ca61da5092b1.camel@hammerspace.com>, <169568091982.19404.4821745630158429694@noble.neil.brown.name>, Date: Tue, 26 Sep 2023 09:04:05 +1000 Message-id: <169568304501.19404.1610884104930799751@noble.neil.brown.name> X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Mon, 25 Sep 2023 16:04:18 -0700 (PDT) On Tue, 26 Sep 2023, Trond Myklebust wrote: > On Tue, 2023-09-26 at 08:28 +1000, NeilBrown wrote: > > On Sat, 23 Sep 2023, Trond Myklebust wrote: > > > On Fri, 2023-09-22 at 13:22 -0400, Olga Kornievskaia wrote: > > > > On Wed, Sep 20, 2023 at 8:27 PM Trond Myklebust > > > > wrote: > > > > > > > > > > On Wed, 2023-09-20 at 15:38 -0400, Anna Schumaker wrote: > > > > > > Hi Trond, > > > > > > > > > > > > On Sun, Sep 17, 2023 at 7:12 PM wrote: > > > > > > > > > > > > > > From: Trond Myklebust > > > > > > > > > > > > > > Commit 4dc73c679114 reintroduces the deadlock that was > > > > > > > fixed by > > > > > > > commit > > > > > > > aeabb3c96186 ("NFSv4: Fix a NFSv4 state manager deadlock") > > > > > > > because > > > > > > > it > > > > > > > prevents the setup of new threads to handle reboot > > > > > > > recovery, > > > > > > > while > > > > > > > the > > > > > > > older recovery thread is stuck returning delegations. > > > > > > > > > > > > I'm seeing a possible deadlock with xfstests generic/472 on > > > > > > NFS > > > > > > v4.x > > > > > > after applying this patch. The test itself checks for various > > > > > > swapfile > > > > > > edge cases, so it seems likely something is going on there. > > > > > > > > > > > > Let me know if you need more info > > > > > > Anna > > > > > > > > > > > > > > > > Did you turn off delegations on your server? If you don't, then > > > > > swap > > > > > will deadlock itself under various scenarios. > > > > > > > > Is there documentation somewhere that says that delegations must > > > > be > > > > turned off on the server if NFS over swap is enabled? > > > > > > I think the question is more generally "Is there documentation for > > > NFS > > > swap?" > > > > The main difference between using NFS for swap and for regular file > > IO > > is in the handling of writes, and particularly in the style of memory > > allocation that is safe while handling a write request (or anything > > which might block some write request, etc). > > > > For buffered IO, memory allocations must be GFP_NOIO or > > PF_MEMALLOC_NOIO. > > For swap-out, memory allocations must be GFP_MEMALLOC or PG_MEMALLOC. > > > > That is the primary difference - all other differences are minor.  > > This > > difference might justify documentation suggesting that > > /proc/sys/vm/min_free_kbytes could usefully be increased, but I don't > > see that more is needed. > > > > The NOIO/MEMALLOC distinction is properly plumbed through nfs, > > sunrpc, > > and networking and all "just works".  The problem area is that > > kthread_create() doesn't take a gfpflags_t argument, so it uses > > GFP_KERNEL allocations to create the new thread. > > > > This means that when a write cannot proceed without state management, > > and state management requests that a threads be started, there is a > > risk > > of memory allocation deadlock. > > I believe the risk is there even for buffered IO, but I'm not 100% > > certain and in practice I don't think a deadlock has ever been > > reported. > > With swap-out it is fairly easy to trigger a deadlock if there is > > heavy > > swap-out traffic when state management is needed. > > > > The common pattern in the kernel when a thread might be needed to > > support writeout is to keep the thread running permanently (rather > > than > > to add a gfpflags_t to kthread_create), so that is what I added to > > the > > nfsv4 state manager. > > > > However the state manager thread has a second use - returning > > delegations.  This sometimes needs to run concurrently with state > > management, so one thread is not enough. > > > > What is that context for delegation return?  Does it ever block > > writes? > > If it doesn't, would it make sense to use a work queue for returning > > delegations - maybe system_wq? > > These are potentially long-lived processes because there may be lock > recovery involved, and because of the conflict with state recovery, so > it does not make sense to put them on a workqueue. > Makes sense - thanks. Are writes blocked while the delegation returns proceeds? If not, would it be reasonable to start a separate kthread on-demand when a return is requested? Thanks, NeilBrown