Received: by 2002:a05:7412:251c:b0:e2:908c:2ebd with SMTP id w28csp2734756rda; Wed, 25 Oct 2023 10:42:26 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHAsyjj27+pCDU69A0To9evib1La/nLGKhWbljJ7Sdjsg8wlC0dwbbKTKlP/T11h5xQzTHH X-Received: by 2002:a81:4784:0:b0:59a:f131:50fa with SMTP id u126-20020a814784000000b0059af13150famr16351021ywa.47.1698255746300; Wed, 25 Oct 2023 10:42:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698255746; cv=none; d=google.com; s=arc-20160816; b=Y0xA7ePEHsSrU/vNfFKP4Y+nPY7wiWEW0u3EuE6q86QsAvLh9gkGExpq1nlC05t79w /2Ui3OqdhuwR9G/PQCnbIIINgMp+zmvv53fczYwJ/oyRMhzAsmmGzU8h6ZlBn0VRqxZP zK+FvEPkWkkh2cW/m4QAB65ECUFDkp7XWpo1COrKHZtyqiOJMk/9uZYqf1gWfBn45RfP q8so2MtnQBdFDKhR+B/8zu5cZDiMIZKxw93tz2CPyKQt/trHy0IbcvE8hJs28rEiTkav H2vclQCNEetROSb/mEC1HExO0zZfJlftH8sGKlKFtB7GlfJR1RKSCX9dTfO/vTbuHhv8 xc/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=UkGSArifVcPZ7lZ367bjkLWFmP8Ugc7BI10JCKWK7ek=; fh=4PV+5Ng5rFMfFCXvEgStSaE3qZJmirZzcVKxhhh25Tw=; b=rKy3ufbwe51EmJSCiee+gzkwhSRmyM6RbgSsGyOzSUWaSgULFbrGIeSeUsV4qeJSIq EaJRJkQ7ogSIG15u7Nf6Aj0USyvlIaRl2ilizRkAFEJ6yHMPejVTeto4grb8H9XquFH2 9gDBb5VLoVjiMerk+DSxD1sZeC/kE6GxGNCCr5sbbs8yjDg3ZAko3mi+2it9UAwUx6Wy n7vlraOASCKnSxU7VrbisYOoJ05jzW8xAX+xtrOPvHVDZQIsxHzZld1wfjtPXOPBHMmw amWIGCQsbzaPeUxzCjI1t1pERpPdjxOzBhVLhN6ZLjE9+U0L8DQrje90IcDwqueMQtMX rQfw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=V1UtJC7f; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from morse.vger.email (morse.vger.email. [23.128.96.31]) by mx.google.com with ESMTPS id j65-20020a0de044000000b005a210c4511dsi11855211ywe.488.2023.10.25.10.42.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Oct 2023 10:42:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) client-ip=23.128.96.31; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=V1UtJC7f; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id C761B80A134C; Wed, 25 Oct 2023 10:41:50 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233821AbjJYRle (ORCPT + 99 others); Wed, 25 Oct 2023 13:41:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52094 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229485AbjJYRla (ORCPT ); Wed, 25 Oct 2023 13:41:30 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B9E5619D for ; Wed, 25 Oct 2023 10:40:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1698255641; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=UkGSArifVcPZ7lZ367bjkLWFmP8Ugc7BI10JCKWK7ek=; b=V1UtJC7fU3hbh6XK1Jyuqu43SFEl8em5e38OiixocFQIC8SOoEZ741vFHVtLl8KcOrRMSS yBbXDpaA0/Xqk+SC/voKC/TEl/+ahG8+QUrrcEN+hcNCN+ctNHfzWK61cDmmJyS+J7v1yj qRRW0b8KZ8gWS/Z82V2KxrzONQ8eAww= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-576-ES4kaKGwPp2CW3tKmHRYRw-1; Wed, 25 Oct 2023 13:40:35 -0400 X-MC-Unique: ES4kaKGwPp2CW3tKmHRYRw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1FDB08F5DA0; Wed, 25 Oct 2023 17:40:35 +0000 (UTC) Received: from dhcp-27-174.brq.redhat.com (unknown [10.45.224.21]) by smtp.corp.redhat.com (Postfix) with SMTP id ED2152166B26; Wed, 25 Oct 2023 17:40:32 +0000 (UTC) Received: by dhcp-27-174.brq.redhat.com (nbSMTP-1.00) for uid 1000 oleg@redhat.com; Wed, 25 Oct 2023 19:39:34 +0200 (CEST) Date: Wed, 25 Oct 2023 19:39:31 +0200 From: Oleg Nesterov To: Chuck Lever Cc: Jeff Layton , Neil Brown , Olga Kornievskaia , Dai Ngo , Tom Talpey , Ingo Molnar , linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: nfsd_copy_write_verifier: wrong usage of read_seqbegin_or_lock() Message-ID: <20231025173931.GA29779@redhat.com> References: <20231025163006.GA8279@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.24 (2015-08-30) X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6 X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Wed, 25 Oct 2023 10:41:51 -0700 (PDT) Hi Chuck, Thanks for your reply. But I am already sleeping and I can't understand it. So let me ask a couple of questions. 1. Do you agree that the current nfsd_copy_write_verifier() code makes no sense? I mean, the usage of read_seqbegin_or_lock() suggests that if the lockless pass fails it should take writeverf_lock for writing. But this can't happen, and thus this code doesn't look right no matter what. None of the read_seqbegin_or_lock/need_seqretry/done_seqretry helpers make any sense because "seq" is alway even. 2. If yes, which change do you prefer? I'd prefer the patch at the end. Oleg. On 10/25, Chuck Lever wrote: > > On Wed, Oct 25, 2023 at 06:30:06PM +0200, Oleg Nesterov wrote: > > Hello, > > > > The usage of writeverf_lock is wrong and misleading no matter what and > > I can not understand the intent. > > The structure of the seqlock was introduced in commit 27c438f53e79 > ("nfsd: Support the server resetting the boot verifier"). > > The NFS write verifier is an 8-byte cookie that is supposed to > indicate the boot epoch of the server -- simply put, when the server > restarts, the epoch (and this verifier) changes. > > NFSv3 and later have a two-phase write scheme where the client > sends data to the server (known as an UNSTABLE WRITE), then later > asks the server to commit that data (a COMMIT). Before the COMMIT, > that data is not durable and the client must hold onto it until > the server's COMMIT Reply indicates it's safe for the client to > discard that data and move on. > > When an UNSTABLE WRITE is done, the server reports its current > epoch as part of each WRITE Reply. If this verifier cookie changes, > the client knows that the server might have lost previously > written written-but-uncommitted data, so it must send the WRITEs > again in that (rare) case. > > NFSD abuses this slightly by changing the write verifier whenever > there is an underlying local write error that might have occurred in > the background (ie, there was no WRITE or COMMIT operation at the > time that the server could use to convey the error back to the > client). This is supposed to trigger clients to send UNSTABLE WRITEs > again to ensure that data is properly committed to durable storage. > > The point of the seqlock is to ensure that > > a) a write verifier update does not tear the verifier > b) a write verifier read does not see a torn verifier > > This is a hot path, so we don't want a full spinlock to achieve > a) and b). > > Way back when, the verifier was updated by two separate 32-bit > stores; hence the risk of tearing. > > > > nfsd_copy_write_verifier() uses read_seqbegin_or_lock() incorrectly. > > "seq" is always even, so read_seqbegin_or_lock() can never take the > > lock for writing. We need to make the counter odd for the 2nd round: > > > > --- a/fs/nfsd/nfssvc.c > > +++ b/fs/nfsd/nfssvc.c > > @@ -359,11 +359,14 @@ static bool nfsd_needs_lockd(struct nfsd_net *nn) > > */ > > void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn) > > { > > - int seq = 0; > > + int seq, nextseq = 0; > > > > do { > > + seq = nextseq; > > read_seqbegin_or_lock(&nn->writeverf_lock, &seq); > > memcpy(verf, nn->writeverf, sizeof(nn->writeverf)); > > + /* If lockless access failed, take the lock. */ > > + nextseq = 1; > > } while (need_seqretry(&nn->writeverf_lock, seq)); > > done_seqretry(&nn->writeverf_lock, seq); > > } > > > > OTOH. This function just copies 8 bytes, this makes me think that it doesn't > > need the conditional locking and read_seqbegin_or_lock() at all. So perhaps > > the (untested) patch below makes more sense? Please note that it should not > > change the current behaviour, it just makes the code look correct (and more > > optimal but this is minor). > > > > Another question is why we can't simply turn nn->writeverf into seqcount_t. > > I guess we can't because nfsd_reset_write_verifier() needs spin_lock() to > > serialise with itself, right? > > "reset" is supposed to be very rare operation. Using a lock in that > case is probably quite acceptable, as long as reading the verifier > is wait-free and guaranteed to be untorn. > > But a seqcount_t is only 32 bits. > > > > Oleg. > > --- > > > > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c > > index c7af1095f6b5..094b765c5397 100644 > > --- a/fs/nfsd/nfssvc.c > > +++ b/fs/nfsd/nfssvc.c > > @@ -359,13 +359,12 @@ static bool nfsd_needs_lockd(struct nfsd_net *nn) > > */ > > void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn) > > { > > - int seq = 0; > > + unsigned seq; > > > > do { > > - read_seqbegin_or_lock(&nn->writeverf_lock, &seq); > > + seq = read_seqbegin(&nn->writeverf_lock); > > memcpy(verf, nn->writeverf, sizeof(nn->writeverf)); > > - } while (need_seqretry(&nn->writeverf_lock, seq)); > > - done_seqretry(&nn->writeverf_lock, seq); > > + } while (read_seqretry(&nn->writeverf_lock, seq)); > > } > > > > static void nfsd_reset_write_verifier_locked(struct nfsd_net *nn) > > > > -- > Chuck Lever >