Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp3010714pxb; Tue, 13 Apr 2021 16:20:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxhmDfvGtUc2Qg9DzO9wsT2acAFkkaicGrOc0WLZoRDFXaq3MYPqjD4ZzXJ1qabKeXcd+Pr X-Received: by 2002:a17:906:1a0d:: with SMTP id i13mr18208111ejf.197.1618356032387; Tue, 13 Apr 2021 16:20:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618356032; cv=none; d=google.com; s=arc-20160816; b=LtH3oZy+kiq1HioEw5DTILa9b6hp0DUz3a5+w+tvLjz6Wg5AupqnX6XA+iWzeAajNm 7klu1kHdI5ZN+RFkEjXiBdBj6T1maCi/H/R+Zi+uhVMU7YdFjsw8HeZBgJhWfvX7rrjk Ka3zFbUNPTc5P0kGUV4eV+IoU7gZ2nQler9TKB/lon46qvVrLgkhJ+Lj7TSeHc02LLxB JeFME0res+bII8iyvzSyB6Fveu0Yv1d7tCYIcbzGNsjfqD/cU9SnwqeN26AW+dTT9i8u QivnTXXHwSadWJwwK53F+y7jEHo6vOcS/+QQv2h8bG5FfSoKQOUW75qHtn5fqwKtI27p Fung== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature:dkim-filter; bh=qcxigicBT63a0XcFL+vP9BKkYYPjKTqVGMHl53IFVJs=; b=Ggi6ckHHpv0E7JletwMxLnLrGZrRYMhwAgoAuTHA9TzzewVcwVcJsuwHB08ozGRz92 yvlB0htxJranBxIxJsz+aWCNUig5F4pZFUykeYC452Y0F5xA+nW1f/DgW1XgfctuupeQ 8DaZnE8sWVyvZH63ds6E+TzuIPAFL3cT2UDvO4U9BRfmrrN9i4Xg40NHu/8JS6r/s0Pd JCRqIIrJpmFndhek53Sngh3YjDIVlo157PQy54/HDBKuQVwd/ZRs4sLzIJpa5uNGeX6/ cnUO5TjlFn5dbxx4GpQxGIBST51RRs7jEsTDkLA/wCvPuaLB2DIGDJ5GGE7l+FM5L7jE ZsVA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fieldses.org header.s=default header.b=dv6PGMpe; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g23si916328edv.289.2021.04.13.16.20.00; Tue, 13 Apr 2021 16:20:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@fieldses.org header.s=default header.b=dv6PGMpe; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231629AbhDMT3a (ORCPT + 99 others); Tue, 13 Apr 2021 15:29:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40122 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229963AbhDMT33 (ORCPT ); Tue, 13 Apr 2021 15:29:29 -0400 Received: from fieldses.org (fieldses.org [IPv6:2600:3c00:e000:2f7::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A413CC061574 for ; Tue, 13 Apr 2021 12:29:09 -0700 (PDT) Received: by fieldses.org (Postfix, from userid 2815) id 5831E724B; Tue, 13 Apr 2021 15:29:08 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.11.0 fieldses.org 5831E724B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fieldses.org; s=default; t=1618342148; bh=qcxigicBT63a0XcFL+vP9BKkYYPjKTqVGMHl53IFVJs=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=dv6PGMpedtHV7pJyzufvF2jHkebZb1ZSZrfdTDzGu2eVSw8i6f2wpDdqZXADzyfSd 4zJXFVOjehxd8NcChVzrIWY6YubZHnso6IKCvJUEq4BvgoNeJ0SWNo/x1pgH/QjjG0 MoIziEvNLaRoxNrnGhxZpGxgWAwhbaQqf9Al7LII= Date: Tue, 13 Apr 2021 15:29:08 -0400 From: "J. Bruce Fields" To: Olga Kornievskaia Cc: Rick Macklem , Linux-NFS Subject: Re: Linux NFSv4.1 client session seqid sometimes advances by 2 Message-ID: <20210413192908.GD28230@fieldses.org> References: <20210413171738.GA28230@fieldses.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Tue, Apr 13, 2021 at 02:59:27PM -0400, Olga Kornievskaia wrote: > On Tue, Apr 13, 2021 at 1:17 PM J. Bruce Fields wrote: > > > > On Tue, Apr 13, 2021 at 09:31:37AM -0400, Olga Kornievskaia wrote: > > > On Tue, Apr 13, 2021 at 3:08 AM Rick Macklem wrote: > > > > > > > > Hi, > > > > > > > > During testing of a Fedora Core 30 (5.2.10 kernel) against a FreeBSD > > > > server (4.1 mount), I have been simulating a network partitioning > > > > for a few minutes (until the TCP connection goes to SYN_SENT on > > > > the Linux client). > > > > > > > > Sometimes, after the network partition heals, the FreeBSD server > > > > replies NFS4ERR_SEQ_MISORDERED. > > > > Looking at the packet trace, the seqid for the slot has advanced by > > > > 2 instead of 1. An RPC request for old-seqid + 1 never seems to get > > > > sent. > > > > --> Since sending an RPC with "seqid + 2" but never sending one > > > > that is "seqid + 1" for a slot seems harmless, I have added an optional > > > > hack (can be turned off), to allow this case instead of replying > > > > NFS4ERR_SEQ_MISORDERED for it. The code will still reply > > > > NFS4ERR_SEQ_MISORDERED if an RPC for the slot with > > > > "old seqid + 1" in it. > > > > --> Yes, doing this hack is a violation of RFC5661, but I've > > > > done it anyhow. > > > > > > > > If you are interested in a packet capture with this in it: > > > > fetch https://people.freebsd.org/~rmacklem/linuxtofreenfs.pcap > > > > - then look at packet #1945 and #2072 > > > > --> You'll see that slot #1 seqid goes from 4 to 6. There is no > > > > slot#1 seqid 5 RPC sent on the wire. > > > > (This packet capture was taken on the Linux client using > > > > tcpdump.) > > > > --> Btw, the "RST battle" you'll see in the above trace between > > > > #2005 and #2068 that goes on until the FreeBSD > > > > krpc/NFS times out the connection after 6min. seems to be a recent > > > > FreeBSD TCP bug. > > > > I have reproduced this seqid advances by 2 on an older system > > > > that does not "RST battle" and allows the reconnect right away, > > > > once the network partition is healed, so it does seem to be > > > > relevant to this bug. > > > > > > > > Someday, I will get around to upgrading to a more recent Linux > > > > kernel and will test to see if I can still reproduce this bug. > > > > On 5.2.10, it is intermittent and does not occur every time I > > > > do the network partitioning test. > > > > > > > > Mostly just fyi, rick > > > > > > Hi Rick, > > > > > > I think this is happening because slotid=1 had something queued up > > > using seqid=5 and that was interrupted because the connection was > > > RSTed. For the interrupted slot, the client would send solo SEQUENCE > > > with +1 seqid. > > > > Doesn't the client send the solo SEQUENCE with seqid 5 in that case? > > No it sends with seq+1 because NFS layer client doesn't know if seqid > actually was actually transmitted before the connection got caught > (and/or received by the server). But then the MISORDERED tells the client it wasn't received, so the client follows up with a call with seqid 5--is that what happens? Sorry, I seem to recall we went through this all a couple years ago, but now I've forgotten how it works. --b.