Received: by 2002:a05:6358:e9c4:b0:b2:91dc:71ab with SMTP id hc4csp5987483rwb; Tue, 9 Aug 2022 07:23:25 -0700 (PDT) X-Google-Smtp-Source: AA6agR6pei0veIGKJP4ZOQ7IpU9OWSBkPxuDgNqKxkvhIIBp/o31d6pDjS8eHFOTc8CfkVYZq6FV X-Received: by 2002:a05:6402:14b:b0:43d:a7dd:4376 with SMTP id s11-20020a056402014b00b0043da7dd4376mr21855593edu.89.1660055005438; Tue, 09 Aug 2022 07:23:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660055005; cv=none; d=google.com; s=arc-20160816; b=r1/NVg+vkcQgwpOa6rsEHBIlNvSwc8Jp3xAxbnPRyDR9AqBM8NsKhG/q8pTGsZLKIV rBUQJYGcerQysMJTbGmI6uCO1EPBuvi+Im36BUBnPHQWWAw2OniDuOrGuBvz1edXr2pY ylvOyHGZZwmg6vOG/F2zCIswandQlGxbDglAs0nwoUq1MAlQ9ifCO+dFWUpii3XpaVNk a5P+oLZPpR1Ui8fRARaE7g8RAbfvY+VQSQ3sguf/I1QRpfCHkEvyngzAHi3Lx9b/pUhy dHhG9J/2tLx5e58zDl4nNI4XFYAOpKQuPxfDvgsvKa4n/mvXhuipHyCKOXFHzopiaz7J BzYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=mKvU98j9+Ud1c9p+qUKCLrE/uUE/iSvXM3bHleJ+TnQ=; b=WUcvzE3rLU+l3HYu65VFshtlZ1e+YDqIFKGSrUG7uVqOE7dOcDySIPQpv822Yq+RrQ RXP1reXLy6r/zSGw690UAR/BZKvG8bh+pNrY8iuwAe/7EpHk8zkCHu/u9COqZJdeLLTg 0b3DHW6K+Asf60PvrTP1hjBYZJVKIpJCb8nyIXwoD7AL+bDFiRi0Ux7PvxO5Eo3WYHa9 7XUw4XoFkuti73KOdvoFwfb6FNY+Cx1gE6tItGspfJ1z6FMzKOLrYJExVBFavFtYKq94 r9+k+1KGth+Pi4OsvDIBAnLUoZxfm4HobhkC4MVjQ/RAwuqSXTeoTKcazQha0jIzfP0U Fodg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=NU7d5xc8; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gj15-20020a170907740f00b007317ad28001si1686098ejc.812.2022.08.09.07.22.55; Tue, 09 Aug 2022 07:23:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=NU7d5xc8; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243868AbiHIOQl (ORCPT + 99 others); Tue, 9 Aug 2022 10:16:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44944 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229715AbiHIOQl (ORCPT ); Tue, 9 Aug 2022 10:16:41 -0400 Received: from mail-lf1-x131.google.com (mail-lf1-x131.google.com [IPv6:2a00:1450:4864:20::131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 460DE6575 for ; Tue, 9 Aug 2022 07:16:39 -0700 (PDT) Received: by mail-lf1-x131.google.com with SMTP id a9so17156970lfm.12 for ; Tue, 09 Aug 2022 07:16:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc; bh=mKvU98j9+Ud1c9p+qUKCLrE/uUE/iSvXM3bHleJ+TnQ=; b=NU7d5xc8NPg0pfMIqgYZCqhMIzBAydLgXesHt917/Kmr71j2qpBY8GBbPiC3mLbkbh 2pFhjWKhkN02f61TVjDriseV+Vt5dBGA36gWMRQWxLJft25ZiEg3NZ0+O8/vgUi7H6X+ Pnyv7lkUjGDau6xVyQtXCVPqJZ2WL5CMjwxREEJaLLmlCEajgpymXROzLiBYDnBavXKR gbL8/L3MdivY0ORkRxeNzPEfALYLvDAOzcrt896x9dDG9EHu3YX1PQFrh96kiNxqiwl3 q2wCilrYr99B0w0HeZ3NtpfHURnUCO0+CmqI1y5mFjx5dpIAKbC7F7TogSt5vQweSXnn e98g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc; bh=mKvU98j9+Ud1c9p+qUKCLrE/uUE/iSvXM3bHleJ+TnQ=; b=uvH082DUGWRcbHNTyTc69nhrzQPl2aQpUdjxRzN/2pl1zHZqBYE/hQLoZXPJT/oK9Z gh3DZSTs9O8hDclKBLJq4l7LtDnBVy6X7LFFb+oCWQL/Ec3ftCW39zue7UHrLSZv5XNP oSEidBlI62YqqsRAr1NutqE1DyrQU+EuH0Th53PNpAFSpS2awFJwyeYka7P5EL4Z3Efe zgbkEpd2p96LdOtvdAZwVnPzp/8FXeWAQvizprakJS8shySbAQBkx2FGUY+suUsLWYDz UYCIRGF16rYeH09MwexetOCDGZyVKeC2U3r9GSPRKK6z0pZ91CDB6Ps3Mvk2iMxUObae edZQ== X-Gm-Message-State: ACgBeo2MG8aMKoY2LBdivuKNmMUDigkF0NhWwyLoCYv3VAJUjkyi8iW5 SO6Jva2LZw6Ez12EMragfdxTf3lGRXc5GKqyvVcxfwOO X-Received: by 2002:a05:6512:3c93:b0:48c:dc60:4786 with SMTP id h19-20020a0565123c9300b0048cdc604786mr4131905lfv.208.1660054597530; Tue, 09 Aug 2022 07:16:37 -0700 (PDT) MIME-Version: 1.0 References: <8ae13798a15c69cf16272579f49768ec92484584.camel@hammerspace.com> <668b5de2f3951f0d64aa10e910a8aa3d626bec91.camel@hammerspace.com> In-Reply-To: From: Jan Kasiak Date: Tue, 9 Aug 2022 10:16:26 -0400 Message-ID: Subject: Re: Question about nlmclnt_lock To: Tom Talpey Cc: Trond Myklebust , "linux-nfs@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Sorry, I meant FreeBSD uses the caller field as well. -Jan On Tue, Aug 9, 2022 at 10:14 AM Jan Kasiak wrote: > > Thanks for all of the resources! > > I was trying to implement an NFS server, and v3 sounded like an easier > place to start :-) > > I think I'll move on to v4. > > If we're revisiting the past, maybe just one last historical question: > > Do either of you know why the Linux Kernel only uses the IP > address/svid to identify the caller? > > FreeBSD uses the owner field as well. > > Jan > > On Sun, Aug 7, 2022 at 8:01 AM Tom Talpey wrote: > > > > On 8/6/2022 3:49 PM, Trond Myklebust wrote: > > > On Sat, 2022-08-06 at 11:03 -0400, Jan Kasiak wrote: > > >> Hi Trond, > > >> > > >> The v4 RFCs do mention protocol design flaws, but don't go into more > > >> detail. > > >> > > >> I was trying to understand those flaws in order to understand how and > > >> why v3 was problematic. > > >> > > >> > > > > > > The main issues derive from the fact that NLM is a side band protocol, > > > meaning that it has no ability to influence the NFS protocol > > > operations. In particular, there is no way to ensure safe ordering of > > > locks and I/O. e.g. if your readahead code kicks in while you are > > > unlocking the file, then there is nothing that guarantees the page > > > reads happened while the lock was in place on the server. > > > The same weakness also causes problems for reboots: if your client > > > doesn't notice that the server rebooted (and lost your locks) because > > > the statd callback mechanism failed, then you're SOL. Your I/O may > > > succeed, but can end up causing problems for another client that has > > > since grabbed the lock and assumes it now has exclusive access to the > > > file. > > > > > > NLM also suffers from intrinsic problems of its own such as lack of > > > only-once semantics. If you send a blocking LOCK request, and > > > subsequently send a CANCEL operation, then who knows whether or not the > > > lock or the cancel get processed first by the server? Many servers will > > > reply LCK_GRANTED to the CANCEL even if they did not find the lock > > > request. Sending an UNLOCK can also cause issues if the lock was > > > granted via a blocking lock callback (NLM_GRANTED) since there is no > > > ordering between the reply to the NLM_GRANTED and the UNLOCK. > > > > > > Finally, as already mentioned, there are multiple issues associated > > > with client or server reboot. The NLM mechanism is pretty dependent on > > > yet another side band mechanism (STATD) to tell you when this occurs, > > > but that mechanism does not work to release the locks held by a client > > > if it fails to come back after reboot. Even if the client does come > > > back, it might forget to invoke the statd process, or it might use a > > > different identifier than it did during the last boot instance (e.g. > > > because DHCP allocated a different IP address, or the IP address it not > > > unique due to use of NAT, or a hostname was used that is non-unique, > > > ...). > > > If the server reboots, then it may fail to notify the client of that > > > reboot through the callback mechanism. Reasons may include the > > > existence of a NAT, failure of the rpcbind/portmapper process on the > > > client, firewalls,... > > > > That brought back memories. > > > > http://www.nfsv4bat.org/Documents/ConnectAThon/2006/talpey-cthon06-nsm.pdf > > > > Here's an even older issues list for nlm on Solaris circa 1996. > > The portrait-mode slides are in reverse order. :) > > > > http://www.nfsv4bat.org/Documents/ConnectAThon/1996/lockmgr.pdf > > > > The NLM protocol is an antique and hasn't been looked at in well > > over a decade (or two!). NLMv4 (circa 1995) widened offsets to > > 64-bit, which was the last innovation it got. None of the RPC > > sideband protocols were ever standardized, btw. > > > > Jan, what are you planning to use it for? Personally I'd advise > > against pretty much anything. > > > > Tom. > > > > > > > >> -Jan > > >> > > >> > > >> On Fri, Aug 5, 2022 at 10:27 PM Trond Myklebust > > >> wrote: > > >>> > > >>> On Fri, 2022-08-05 at 19:17 -0400, Jan Kasiak wrote: > > >>>> Hi, > > >>>> > > >>>> I was looking at the code for nlmclnt_lock and wanted to ask a > > >>>> question about how the Linux kernel client and the NLM 4 protocol > > >>>> handle some errors around certain edge cases. > > >>>> > > >>>> Specifically, I think there is a race condition around two > > >>>> threads of > > >>>> the same program acquiring a lock, one of the threads being > > >>>> interrupted, and the NFS client sending an unlock when none of > > >>>> the > > >>>> program threads called unlock. > > >>>> > > >>>> On NFS server machine S: > > >>>> there exists an unlocked file F > > >>>> > > >>>> On NFS client machine C: > > >>>> in program P: > > >>>> thread 1 tries to lock(F) with fd A > > >>>> thread 2 tries to lock(F) with fd B > > >>>> > > >>>> The Linux client will issue two NLM_LOCK calls with the same svid > > >>>> and > > >>>> same range, because it uses the program id to map to an svid. > > >>>> > > >>>> For whatever reason, assume the connection is broken (cable gets > > >>>> pulled etc...) > > >>>> and `status = nlmclnt_call(cred, req, NLMPROC_LOCK);` fails. > > >>>> > > >>>> The Linux client will retry the request, but at some point thread > > >>>> 1 > > >>>> receives a signal and nlmclnt_lock breaks out of its loop. > > >>>> Because > > >>>> the > > >>>> Linux client request failed, it will fall through and go to the > > >>>> out_unlock label, where it will want to send an unlock request. > > >>>> > > >>>> Assume that at some point the connection is reestablished. > > >>>> > > >>>> The Linux kernel client now has two outstanding lock requests to > > >>>> send > > >>>> to the remote server: one for a lock that thread 2 is still > > >>>> trying to > > >>>> acquire, and one for an unlock of thread 1 that failed and was > > >>>> interrupted. > > >>>> > > >>>> I'm worried that the Linux client may first send the lock > > >>>> request, > > >>>> and > > >>>> tell thread 2 that it acquired the lock, and then send an unlock > > >>>> request from the cancelled thread 1 request. > > >>>> > > >>>> The server will successfully process both requests, because the > > >>>> svid > > >>>> is the same for both, and the true server side state will be that > > >>>> the > > >>>> file is unlocked. > > >>>> > > >>>> One can talk about the wisdom of using multiple threads to > > >>>> acquire > > >>>> the > > >>>> same file lock, but this behavior is weird, because none of the > > >>>> threads called unlock. > > >>>> > > >>>> I have experimented with reproducing this, but have not been > > >>>> successful in triggering this ordering of events. > > >>>> > > >>>> I've also looked at the code of in clntproc.c and I don't see a > > >>>> spot > > >>>> where outstanding failed lock/unlock requests are checked while > > >>>> processing lock requests? > > >>>> > > >>>> Thanks, > > >>>> -Jan > > >>> > > >>> Nobody here is likely to want to waste much time trying to 'fix' > > >>> the > > >>> NLM locking protocol. The protocol itself is known to be extremely > > >>> fragile, and the endemic problems constitute some of the main > > >>> motivations for the development of the NFSv4 protocol > > >>> (See https://datatracker.ietf.org/doc/html/rfc2624#section-8 > > >>> and https://datatracker.ietf.org/doc/html/rfc7530#section-9). > > >>> > > >>> If you need more reliable support for POSIX locks beyond what > > >>> exists > > >>> today for NLM, then please consider NFSv4. > > >>> > > >>> -- > > >>> Trond Myklebust > > >>> Linux NFS client maintainer, Hammerspace > > >>> trond.myklebust@hammerspace.com > > >>> > > >>> > > >