Received: by 2002:a05:7412:2a8c:b0:e2:908c:2ebd with SMTP id u12csp1516555rdh; Mon, 25 Sep 2023 15:39:20 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFplKuEK1VT/srPdinsdfDKJdSm9XgrFl43VISiirdWMEN1tRE/SHU4wsuYuP6LQoBxT0Fs X-Received: by 2002:a05:6808:1304:b0:3a7:b094:8f2 with SMTP id y4-20020a056808130400b003a7b09408f2mr12475610oiv.45.1695681559730; Mon, 25 Sep 2023 15:39:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695681559; cv=none; d=google.com; s=arc-20160816; b=wp9BURM+joPkOVOiMxY0RYETBAzI/s7YLj/rpnb4ihy9T8uZAC972WbTtsOIg4H0zE /nEqZxIghific8ONTa2qmhYGoVUctrPdmtsKQ/HhVOQ1RXVtZgzKU9pqjFUsBQWLMxvG XN85OqqNZ8BG0WSouBTEX3xEeS+exHj4AZFcfKs6Goyrp3LQ1+YrqvpV5pIMS6Yz9A63 twab04siL/9yQAJatg5nFylyN6cj5nHoNDkGoJpvjC/MD3RRqRAyqiSBmOVEB4QpyxQw oNgAWmUElyKrDgdfp6ahMlMVo/AoR7oT+zeqIXANJJr+FGzk3h0bgz5T8FPBT3kgFx7f Bwdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:references:in-reply-to:subject :cc:to:from:mime-version:content-transfer-encoding:dkim-signature :dkim-signature; bh=C1YTh2ZbvDc6Qaz3c3+oIaLCAPA8I4bw0FSvGIQ0Pkk=; fh=bA1RiJHNdDWKYORaM0Tyx/hsnRda3qin4W8lcedjnoE=; b=q5lsFV+ogKn/p4usAuXA9Xk3beneA2wlH515H/QZX3bIlutBmctva392UNparEIk0A R0i+fiuZcgTOVN4/UJAP0Q9omzdQkbhCimVSZVMOxQEXuBBbN8NkxZrT9b5lbFoDQ1Lj 1ZsjrE6OEY3eA5h+U0o8cZIFIukXiKj9Fnov8/tcBGA2e0wNfy0S83idI5zzJ25R264B 6XDudwnktucd3FvQVjLZWwRDST1oQE+nJre1Ib7DKxKV8GBTZ4Z9I3m/+0CTwwu3A8vR Wg6aDn/x7k8gB26TmXzlGLkrp5GhUqakqmKsRG1mcnWexN1pBP5UncZC56//gYUnqmPI OVTg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=SJdqB8eI; dkim=neutral (no key) header.i=@suse.de; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Return-Path: Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id a70-20020a639049000000b00573ff094485si11424759pge.250.2023.09.25.15.39.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 15:39:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=SJdqB8eI; dkim=neutral (no key) header.i=@suse.de; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 4584080ECADE; Mon, 25 Sep 2023 15:28:49 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232231AbjIYW2y (ORCPT + 99 others); Mon, 25 Sep 2023 18:28:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52568 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230097AbjIYW2x (ORCPT ); Mon, 25 Sep 2023 18:28:53 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8EB599C for ; Mon, 25 Sep 2023 15:28:46 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 44EC421845; Mon, 25 Sep 2023 22:28:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1695680925; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=C1YTh2ZbvDc6Qaz3c3+oIaLCAPA8I4bw0FSvGIQ0Pkk=; b=SJdqB8eIXX8i35rtmATrXgJDqSht4IX+Bym6RGUyt03KJHDgMzyVqCp73dPy0X1oAF+x8O FYsImHUKxIsvh6SPx5kHJJUdGxltz1zIEPvrYnjSL87J2GvPcwt82zZjwBqZWtLLO2CsRM b6XSzn8SgihGkdcvNCacYIcTHxf0tkw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1695680925; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=C1YTh2ZbvDc6Qaz3c3+oIaLCAPA8I4bw0FSvGIQ0Pkk=; b=nDlknFPTm3qA0cTp0zzLoPcmjKjX0/1H51n/r3JRnNxG/gTjWjUUAV8Iqh3q5bRZHHUA5o jQ+KsL3sfsrezoBw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 771C613580; Mon, 25 Sep 2023 22:28:43 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id evbwC5sJEmWAFQAAMHmgww (envelope-from ); Mon, 25 Sep 2023 22:28:43 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit MIME-Version: 1.0 From: "NeilBrown" To: "Trond Myklebust" Cc: "aglo@umich.edu" , "linux-nfs@vger.kernel.org" , "Anna.Schumaker@netapp.com" , "schumaker.anna@gmail.com" Subject: Re: [PATCH 2/2] NFSv4: Fix a state manager thread deadlock regression In-reply-to: <077cb75b44afd2404629c1388a92ca61da5092b1.camel@hammerspace.com> References: <20230917230551.30483-1-trondmy@kernel.org>, <20230917230551.30483-2-trondmy@kernel.org>, , <9eda74d7438ee0a82323058b9d4c2b98f4e434cf.camel@hammerspace.com>, , <077cb75b44afd2404629c1388a92ca61da5092b1.camel@hammerspace.com> Date: Tue, 26 Sep 2023 08:28:39 +1000 Message-id: <169568091982.19404.4821745630158429694@noble.neil.brown.name> X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Mon, 25 Sep 2023 15:28:49 -0700 (PDT) On Sat, 23 Sep 2023, Trond Myklebust wrote: > On Fri, 2023-09-22 at 13:22 -0400, Olga Kornievskaia wrote: > > On Wed, Sep 20, 2023 at 8:27 PM Trond Myklebust > > wrote: > > > > > > On Wed, 2023-09-20 at 15:38 -0400, Anna Schumaker wrote: > > > > Hi Trond, > > > > > > > > On Sun, Sep 17, 2023 at 7:12 PM wrote: > > > > > > > > > > From: Trond Myklebust > > > > > > > > > > Commit 4dc73c679114 reintroduces the deadlock that was fixed by > > > > > commit > > > > > aeabb3c96186 ("NFSv4: Fix a NFSv4 state manager deadlock") > > > > > because > > > > > it > > > > > prevents the setup of new threads to handle reboot recovery, > > > > > while > > > > > the > > > > > older recovery thread is stuck returning delegations. > > > > > > > > I'm seeing a possible deadlock with xfstests generic/472 on NFS > > > > v4.x > > > > after applying this patch. The test itself checks for various > > > > swapfile > > > > edge cases, so it seems likely something is going on there. > > > > > > > > Let me know if you need more info > > > > Anna > > > > > > > > > > Did you turn off delegations on your server? If you don't, then > > > swap > > > will deadlock itself under various scenarios. > > > > Is there documentation somewhere that says that delegations must be > > turned off on the server if NFS over swap is enabled? > > I think the question is more generally "Is there documentation for NFS > swap?" The main difference between using NFS for swap and for regular file IO is in the handling of writes, and particularly in the style of memory allocation that is safe while handling a write request (or anything which might block some write request, etc). For buffered IO, memory allocations must be GFP_NOIO or PF_MEMALLOC_NOIO. For swap-out, memory allocations must be GFP_MEMALLOC or PG_MEMALLOC. That is the primary difference - all other differences are minor. This difference might justify documentation suggesting that /proc/sys/vm/min_free_kbytes could usefully be increased, but I don't see that more is needed. The NOIO/MEMALLOC distinction is properly plumbed through nfs, sunrpc, and networking and all "just works". The problem area is that kthread_create() doesn't take a gfpflags_t argument, so it uses GFP_KERNEL allocations to create the new thread. This means that when a write cannot proceed without state management, and state management requests that a threads be started, there is a risk of memory allocation deadlock. I believe the risk is there even for buffered IO, but I'm not 100% certain and in practice I don't think a deadlock has ever been reported. With swap-out it is fairly easy to trigger a deadlock if there is heavy swap-out traffic when state management is needed. The common pattern in the kernel when a thread might be needed to support writeout is to keep the thread running permanently (rather than to add a gfpflags_t to kthread_create), so that is what I added to the nfsv4 state manager. However the state manager thread has a second use - returning delegations. This sometimes needs to run concurrently with state management, so one thread is not enough. What is that context for delegation return? Does it ever block writes? If it doesn't, would it make sense to use a work queue for returning delegations - maybe system_wq? I think it might be best to have the nfsv4 state manager thread always be running whether swap is enabled or not. As I say I think there is at least a theoretical risk of a deadlock even without swap, and having a small test matrix is usually a good thing. Thanks, NeilBrown