Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp66176rwl; Thu, 6 Apr 2023 14:56:10 -0700 (PDT) X-Google-Smtp-Source: AKy350ZMTXbDkpXsYHPjUk+JwOvLXSLOWJn67YICZsW8s8wQjdwUSOYYg65UqTSLpJ0HcO3xFUJK X-Received: by 2002:a05:6a20:b90a:b0:da:949:f074 with SMTP id fe10-20020a056a20b90a00b000da0949f074mr790700pzb.43.1680818170302; Thu, 06 Apr 2023 14:56:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680818170; cv=none; d=google.com; s=arc-20160816; b=J5Dp+5yQC8zhWJagofEEvpH7A2QScnL1ynWTFLtIOc74/J9b0E003xCjO33eg2S39d ufmQ5iyd3ra86fVN3mvgV1n5rDcre3P/cFjEktHLOxsOVMtE5K1zNDzJf75XDe3gSS5A YdyNIi/65Rz4jef4UMVa3ovE1TC1r947P6e2kA1NzqHiFgJoRH3qpmSHgubukvZpTgPz IdU71fObu80wOT1pzZ7t+NOJEtwpjU2NWMW9u7q/y+WQw6VESiqUTfiIwwtqKIzzVhsn 0oe3WwtNjQu/YzBuKlmrRmTKX9K8TBdEU+xotVYYdjEfqwjmMNKW1FQAG0dOmVsvRnuG CDFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent :content-transfer-encoding:references:in-reply-to:date:cc:to:from :subject:message-id:dkim-signature; bh=Pw5g8sc/Gt4qfQQPF8TkMv5agFQZ/Q2FdZX199wWTvM=; b=SMOFryCfp5nqXI7Y1ctYMQXvIUU0ebThx7VnSnB3qCEM3biZD99iurOIM0FWfxoSFe wBZQfwjzztg1epSM2omh72fDEvHK+jWfc3ObRoPfCfIpfMeBk6D48kvjAGmIugy25knB Khso/R2Kj5dV5Cb6bKq68SN3wARPytBRDOPjcaj1sSnY/ctd7blvLgJ8u4Uttk0YR3s6 2S1VXqoFBkSiiNBL3nlh1PB6IZe2Nd5uzuDPzZFf5ffYeynciQK055RnDJz2wa26sm6u S0ebd8BSmNLiPMicgiQdZDNvyqGyAGFXX+ketOVuEsUyyoS/up1gwl+obDHG0wuddOtk V0mg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=R8mIb1Nf; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v188-20020a632fc5000000b005098f98e16csi2078627pgv.321.2023.04.06.14.55.26; Thu, 06 Apr 2023 14:56:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=R8mIb1Nf; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232375AbjDFVwD (ORCPT + 99 others); Thu, 6 Apr 2023 17:52:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39600 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229459AbjDFVwC (ORCPT ); Thu, 6 Apr 2023 17:52:02 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C4D9483CE for ; Thu, 6 Apr 2023 14:52:01 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 605DB64C35 for ; Thu, 6 Apr 2023 21:52:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 253EFC433D2; Thu, 6 Apr 2023 21:52:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1680817920; bh=F6Htyrs5w1XHen7+WnroJzBGKkT3iBeVb7zhZxSQ+g4=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=R8mIb1Nfm/y2IboXgIGK8qh9ed99iVv07mlH6qfHfmbxi9NZtwIT+Paa7Kt3SW82V DpnvmL4MIUizfI3JqXfewMutK4Zrarxy5F0XS2+sRg9/B7v1RiG13aA9v2+nHWj9ZY pkJzmGFN7n6ZvhOXu57v+9PMaFnRXduCj0HVjs0yxbOWhoT5LkNB7pL9j2Oqfdl0b3 sR+7CfAoCcNma8GWsJh5OiGtDq4db0ZI9Hi/nNlGRbgDGf5yzzwd7ClQzpyYvUL53w eoqUn3Xa8G6IL4v9qLbyeKJgWnbulatp/aIirAL1bygQl2tq3p7zbA53GirbeJrx/L 4nrM4yq5tuPaA== Message-ID: Subject: Re: [PATCH] SUNRPC: remove the maximum number of retries in call_bind_status From: Jeff Layton To: dai.ngo@oracle.com, Chuck Lever III Cc: Linux NFS Mailing List , Helen Chao , Anna Schumaker , Trond Myklebust Date: Thu, 06 Apr 2023 17:51:58 -0400 In-Reply-To: References: <1678301132-24496-1-git-send-email-dai.ngo@oracle.com> <9D5A190A-333A-4470-8572-CF85EE9A8086@oracle.com> <182842b2-3de4-d64b-d729-f4f6c9c576d6@oracle.com> <64c4e5c4e61962fd828bcbef79db1df6466a875d.camel@kernel.org> <07d8f93d-c8f3-5ede-66a3-219eea0fdfe6@oracle.com> Content-Type: text/plain; charset="ISO-8859-15" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.46.4 (3.46.4-1.fc37) MIME-Version: 1.0 X-Spam-Status: No, score=-5.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Thu, 2023-04-06 at 13:58 -0700, dai.ngo@oracle.com wrote: > On 4/6/23 12:59 PM, Jeff Layton wrote: > > On Thu, 2023-04-06 at 19:43 +0000, Chuck Lever III wrote: > > > > On Apr 6, 2023, at 3:36 PM, Dai Ngo wrote: > > > >=20 > > > > Hi Jeff, > > > >=20 > > > > Thank you for taking a look at the patch. > > > >=20 > > > > On 4/6/23 11:10 AM, Jeff Layton wrote: > > > > > On Thu, 2023-04-06 at 13:33 -0400, Jeff Layton wrote: > > > > > > On Tue, 2023-03-14 at 09:19 -0700, dai.ngo@oracle.com wrote: > > > > > > > On 3/8/23 11:03 AM, dai.ngo@oracle.com wrote: > > > > > > > > On 3/8/23 10:50 AM, Chuck Lever III wrote: > > > > > > > > > > On Mar 8, 2023, at 1:45 PM, Dai Ngo wrote: > > > > > > > > > >=20 > > > > > > > > > > Currently call_bind_status places a hard limit of 3 to = the number of > > > > > > > > > > retries on EACCES error. This limit was done to accommo= date the > > > > > > > > > > behavior > > > > > > > > > > of a buggy server that keeps returning garbage when the= NLM daemon is > > > > > > > > > > killed on the NFS server. However this change causes pr= oblem for other > > > > > > > > > > servers that take a little longer than 9 seconds for th= e port mapper to > > > > > > > > > >=20 > > > > > > > > > >=20 > > > > > Actually, the EACCES error means that the host doesn't have the p= ort > > > > > registered. > > > > Yes, our SQA team runs stress lock test and restart the NFS server. > > > > Sometimes the NFS server starts up and register to the port mapper = in > > > > time and there is no problem but occasionally it's late coming up c= ausing > > > > this EACCES error. > > > >=20 > > > > > That could happen if (e.g.) the host had a NFSv3 mount up > > > > > with an NLM connection and then crashed and rebooted and didn't r= emount > > > > > it. > > > > can you please explain this scenario I don't quite follow it. If th= e v3 > > > > client crashed and did not remount the export then how can the clie= nt try > > > > to access/lock anything on the server? I must have missing somethin= g here. > > > >=20 > > > > > =20 > > Suppose you have a client with an admin that mounts a NFSv3 mount "by > > hand" (and doesn't set up statd to run at boot time). Client requests a= n > > NLM lock and then reboots. > >=20 > > When it comes up, there's no notification to the server that the client > > rebooted. Later, the lock becomes free and the server tries to grant it > > to the client. It talks to rpcbind but lockd is never started and the > > server keeps querying the client's rpcbind forever. > >=20 > > Maybe more likely situation: the client crashes and loses its DHCP > > address when it comes back up, and the old addr gets reassigned to > > another host that has rpcbind running but no NFS. > >=20 > > Either way, it'd keep trying to call the client back indefinitely that > > way. >=20 > Got it Jeff, thank you for the explanation. This is when NLM requests > are originated from the NFS server. >=20 Mostly, yes. The old, stateless NFS v2/v3 server code didn't have much in the way of callbacks, and v4 (for the most part) doesn't rely on rpcbind. That said, there may be some RPC calls done by the v2/3 NFS client that don't have a direct connection to a client task. Consider stuff like writeback requests. Signals won't do anything there. I think keeping a hard timeout of some sort is probably prudent. >=20 --=20 Jeff Layton