Received: by 2002:a25:c593:0:0:0:0:0 with SMTP id v141csp5202775ybe; Tue, 17 Sep 2019 04:29:27 -0700 (PDT) X-Google-Smtp-Source: APXvYqz2ctDrb3L3cVu0wzhbgcNiteXla/aViMhQEgvU00Y+lhiaBZVnvBNEEGpkOb4mpX3xvCmP X-Received: by 2002:a17:906:32c2:: with SMTP id k2mr1882588ejk.140.1568719767561; Tue, 17 Sep 2019 04:29:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568719767; cv=none; d=google.com; s=arc-20160816; b=WfANHfCzzP7FnvcuUFGvMW2KvV+o49Skq2RrcySwQ588n+bowq2rU7/1DgBRbOJ105 IuYbmgAB/IQHjpyWBcObajdwRNtUG0ICKZDQLPx614non+rqNit1HynrzPY9maPxMF+U dTZYVe0FgP1NIJx2F7fWJA2Gb6PgXYkzkoFiK+gmjjHIT5y+K4zoNTlBncfEqclB6/9N FNDeAnbARF3JMmNkUArPpZJQMKSPG8uVOvibe3Ivh09UUvp7HLftWveeDkzmji+Sxr5w 5wteuumyeyYlgzfWT2AouBlEYj0AvEWrofEXEY8gfhi+Wz51+HoDdp9PhL50FNm9yZEB DrBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=czN0xlXThgAThp0bvUFehMHYUdxfeu9Ihgwy8aJmIGw=; b=RhaWvTvqcoYLMi2qcveZvyh5Dfm6MLx1NM+4l15yc2uup+zV4J5Q5Fi8bksJevyO6A /N1SRY0Ozb/WF0slgtzAn4ApHzU2FPXdSm+mDMpNVmOszKUsCSlknkXPIIQPQZ64yBsM 15bfqmWQsoiG3hhNhH65doYp1l+gzTUuOxemfP4nM6qQlVwEBzUbRztMqc5oy+U6foG+ OSO7GGqlEztdTLggScfJcJOEGpueBCUmcSvfCFXtsRk+RSiKz8A+IL6gWvcjMFnPBTb/ +8L7H9zbrJHr86kSt6bYXPhaCjdvaesPd/BczRo9ZmC9ScVxzqSxCmQOYeB5KHuYIp+I ZNSA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u8si868291ejt.219.2019.09.17.04.28.49; Tue, 17 Sep 2019 04:29:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726576AbfIQL2q (ORCPT + 99 others); Tue, 17 Sep 2019 07:28:46 -0400 Received: from mx1.redhat.com ([209.132.183.28]:43620 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726106AbfIQL2q (ORCPT ); Tue, 17 Sep 2019 07:28:46 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BB1E98980E1; Tue, 17 Sep 2019 11:28:45 +0000 (UTC) Received: from [172.16.176.1] (ovpn-64-2.rdu2.redhat.com [10.10.64.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 79C445D70D; Tue, 17 Sep 2019 11:28:44 +0000 (UTC) From: "Benjamin Coddington" To: "Leon Kyneur" Cc: linux-nfs@vger.kernel.org Subject: Re: troubleshooting LOCK FH and NFS4ERR_BAD_SEQID Date: Tue, 17 Sep 2019 07:28:43 -0400 Message-ID: <8217416C-F3E5-4BEE-BD01-2BE19952425E@redhat.com> In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.6.2 (mx1.redhat.com [10.5.110.67]); Tue, 17 Sep 2019 11:28:46 +0000 (UTC) Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On 12 Sep 2019, at 4:27, Leon Kyneur wrote: > Hi > > I'm experiencing an issue on NFS 4.0 + 4.1 where we cannot call fcntl > locks on any file on the share. The problem goes away if the share is > umount && mount (mount -o remount does not resolve the issue) > > Client: > EL 7.4 3.10.0-693.5.2.el7.x86_64 nfs-utils-1.3.0-0.48.el7_4.x86_64 > > Server: > EL 7.4 3.10.0-693.5.2.el7.x86_64 nfs-utils-1.3.0-0.48.el7_4.x86_64 > > I can't figure this out but the client reports bad-sequence-id in > dupicate in the logs: > Sep 12 02:16:59 client kernel: NFS: v4 server returned a bad > sequence-id error on an unconfirmed sequence ffff881c52286220! > Sep 12 02:16:59 client kernel: NFS: v4 server returned a bad > sequence-id error on an unconfirmed sequence ffff881c52286220! > Sep 12 02:17:39 client kernel: NFS: v4 server returned a bad > sequence-id error on an unconfirmed sequence ffff8810889cb020! > Sep 12 02:17:39 client kernel: NFS: v4 server returned a bad > sequence-id error on an unconfirmed sequence ffff8810889cb020! > Sep 12 02:17:44 client kernel: NFS: v4 server returned a bad > sequence-id error on an unconfirmed sequence ffff881b414b2620! > > wireshark capture shows only 1 BAD_SEQID reply from the server: > $ tshark -r client_broken.pcap -z proto,colinfo,rpc.xid,rpc.xid -z > proto,colinfo,nfs.seqid,nfs.seqid -R 'rpc.xid == 0x9990c61d' > tshark: -R without -2 is deprecated. For single-pass filtering use -Y. > 141 93 172.27.30.129 -> 172.27.255.28 NFS 352 V4 Call LOCK FH: > 0x80589398 Offset: 0 Length: nfs.seqid == 0x0000004e > nfs.seqid == 0x00000002 rpc.xid == 0x9990c61d > 142 93 172.27.255.28 -> 172.27.30.129 NFS 124 V4 Reply (Call > In 141) LOCK Status: NFS4ERR_BAD_SEQID rpc.xid == 0x9990c61d > > system call I have identified as triggering it is: > fcntl(3, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=1073741824, > len=1}) = -1 EIO (Input/output error) Can you simplify the trigger into something repeatable? Can you determine if the client or the server has lost track of the sequence? > The server filesystem is ZFS though NFS sharing is turned off via ZFS > options and it's exported using /etc/exports / nfsd... > > The BAD_SEQID error seems to be fairly random, we have over 2000 > machines connected to the share and it's experienced frequently but > randomly accross our clients. > > It's worth mentioning that the majority of the clients are mounting > 4.0 we did try 4.1 everywhere but hit this > https://access.redhat.com/solutions/3146191 This was fixed in kernel-3.10.0-735.el7, FWIW.. Ben