Received: by 2002:a05:6a10:c604:0:0:0:0 with SMTP id y4csp746021pxt; Thu, 12 Aug 2021 08:42:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwq/k2ECCDZvNpvCRJSruv3vaFDM2CwnM14Zh/NBDUd/0Iz1TbCBgi16N5k8xmxmnEXqGyH X-Received: by 2002:a6b:490d:: with SMTP id u13mr3578828iob.176.1628782937524; Thu, 12 Aug 2021 08:42:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1628782937; cv=none; d=google.com; s=arc-20160816; b=iSCL0yUbtXVP7O9LIiQudYbuC7URk/UgnhTv1kjZh+EcWYM5n9QpS4Di4QyC+iuo23 j3dXi3vxIk6cDpp4ZGeemYFBhz0NoCtpqQdh+a/dQa6IdVgdM6YyCmHUCWb1y49hrp1z wp7HhhzA9XFs3rpTuagfS8ClK0hQ7/fB2Bk2UZnomu8nyizrRlBIyJWPCGbf+5npzm4t oO+5lh04BR+XqdDFirWG3dTmenkaWhv2cd0oJ0Eceel4QTaSM/vEv5HDm1gLPQWwaU3u 6dmeSSqujrd7k5W2goVzH5lk+AST2XXJDcXwP4yHIXQF36fprZddDXw9vL++mvQ7EkBN rUiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature:dkim-filter; bh=FIzx6+0WQLG4+uWXMyGLp76L9xEc6wRLsTXvyCKdfuA=; b=w8XgafBzNgpOnvP+aH19Ftb5b7LC6WQKQp/8bFF/b5hE+37JSy/gZGAKxPO6ea3WEI y7jcQtrSHw6Zy5JMcjdDP5CYfwLyKMtwQR4WGZ2/M6UbCENBHb150mD0vsDCGcMmXFQ1 Ng0Jp240RMIP1BTPbmI/idAD4YtfImniZjlYjSt8s1AmmtdtyimF/PmH1GTj+xuarcgC kge4y78M7Fw7+VKy6NuSWfDEQqzNSEhfVlh2pNuntI3EbLemCTVWpb0Q6+8dQu+vYoOW YXCgXstrjpvypNW5E7Wyms3DU4teth4qCWGXGLspAovsc3N3cxbkeq9j2dYZMW4dfaZt dt3Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fieldses.org header.s=default header.b=BgoGYvkM; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id x26si3147466jai.112.2021.08.12.08.41.54; Thu, 12 Aug 2021 08:42:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@fieldses.org header.s=default header.b=BgoGYvkM; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237922AbhHLPk2 (ORCPT + 99 others); Thu, 12 Aug 2021 11:40:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36662 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232854AbhHLPk1 (ORCPT ); Thu, 12 Aug 2021 11:40:27 -0400 Received: from fieldses.org (fieldses.org [IPv6:2600:3c00:e000:2f7::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A7862C061756 for ; Thu, 12 Aug 2021 08:40:02 -0700 (PDT) Received: by fieldses.org (Postfix, from userid 2815) id 7DA6C7C76; Thu, 12 Aug 2021 11:40:01 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.11.0 fieldses.org 7DA6C7C76 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fieldses.org; s=default; t=1628782801; bh=FIzx6+0WQLG4+uWXMyGLp76L9xEc6wRLsTXvyCKdfuA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=BgoGYvkMaiEu0qLuQFTNNql0bmyWlhw5Nf3i3YjvK9fILrdHX5PbMq/XRc083q7qn bjom1z+3o+5X4bFrLkG0fIMIJtivaG6OpSMARzpgBU8+VyjVnjHZyZHOTmaN3kAsut u70kdNk0+iUHzVPektLBJ2ceFhdpi+tiAmGzPCLk= Date: Thu, 12 Aug 2021 11:40:01 -0400 From: "J. Bruce Fields" To: Olga Kornievskaia Cc: Chuck Lever III , Bruce Fields , Timo Rothenpieler , Linux NFS Mailing List , Dai Ngo Subject: Re: Spurious instability with NFSoRDMA under moderate load Message-ID: <20210812154001.GB9536@fieldses.org> References: <5DD80ADC-0A4B-4D95-8CF7-29096439DE9D@oracle.com> <0444ca5c-e8b6-1d80-d8a5-8469daa74970@rothenpieler.org> <3AF4F6CA-8B17-4AE9-82E2-21A2B9AA0774@oracle.com> <95DB2B47-F370-4787-96D9-07CE2F551AFD@oracle.com> <20210811201435.GA31574@fieldses.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Wed, Aug 11, 2021 at 04:40:04PM -0400, Olga Kornievskaia wrote: > On Wed, Aug 11, 2021 at 4:14 PM J. Bruce Fields wrote: > > > > On Wed, Aug 11, 2021 at 08:01:30PM +0000, Chuck Lever III wrote: > > > Probably not just CB_RECALL, but agreed, there doesn't seem to > > > be any mechanism that can re-drive callback operations when the > > > backchannel is replaced. > > > > The nfsd4_queue_cb() in nfsd4_cb_release() should queue a work item > > to run nfsd4_run_cb_work, which should set up another callback client if > > necessary. But I think the result is it'll look to see if there's another connection available for callbacks, and give up immediately if not. There's no logic to wait for the client to fix the problem. > diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c > index 7325592b456e..ed0e76f7185c 100644 > --- a/fs/nfsd/nfs4callback.c > +++ b/fs/nfsd/nfs4callback.c > @@ -1191,6 +1191,7 @@ static void nfsd4_cb_done(struct rpc_task *task, > void *calldata) > case -ETIMEDOUT: > case -EACCES: > nfsd4_mark_cb_down(clp, task->tk_status); > + cb->cb_need_restart = true; > } > break; > default: > > Something like this should requeue and retry the callback? I think we'd need more than just that. --b.