Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp892644pxp; Wed, 16 Mar 2022 20:24:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzcKgDvg+5oyxl8f6wfSP/E4w5hyy+FfaqfKg2uNZuqwCDFAl1ueaXPIIPVL8uTEmDlWnON X-Received: by 2002:a17:90a:f2cd:b0:1c6:5a37:69ab with SMTP id gt13-20020a17090af2cd00b001c65a3769abmr3015702pjb.224.1647487465943; Wed, 16 Mar 2022 20:24:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1647487465; cv=none; d=google.com; s=arc-20160816; b=xZYGggthHoFU4l3EaW7fmORxLwft6cQ/Aa5Ch2o3at0K69XPb4OvxZJUaDoHWAb+fP HejDNk5glzcRSMp6cFSb6NBAOve9g5Chsq11N/a2hZ5ZbId9E9JEdXuhiGg1C3gXGdb6 STBZVRjFJPQ9LzcK14zVpXzf2gXocJcBEIx0zOp/kOPz8sxpA3DJKJ/cwWK6wOEhtBNt IEPo4uiehA2cAAwysObJgvWgpOQfWWv8eFHRHv3n+xglSGJu7WpGUM/jz/nUX4yDaosu v6+H17DMYJ7mHXe3S+1ZPwd/frmxucp7z9wTk4eCy3Ijws6T65gpSBPQp6+SzOTjkQnZ d8/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:message-id:date :subject:to:from:dkim-signature; bh=OWThl9jEsgBbK5Cl0AD8eyxjcVEabLJ19i24HO27nRY=; b=f17eEu5oGIDhuVg5RREVno7Av7eTXWabdu+JcaDwAwkqSGyfGu/He7QVN6+Qpyu+wV wjsJT6oURhhSh/tuL3WhWifnaXdVAt/Lx4JFyTrGXYcrSWK5cgpQ3tbSy8MX3JHmkujZ BRq2BOp5NbXNXGrIOoCbGqFb4eSg4zKDgs3BYi68EoOOzLoc6bnuRuDNmhq1HE/yJ9MW MKnwfdDlTcMA7q1xP6oy1oW496OZ/pgscjm2boZqW/FjPFp6Ud2VsQgdl7Bstn8q/mA+ fVH1CI+RQQS6ufsIt+Xgjq+26FBL8pRIfiL00vVToZD0TwXwZmoqdZzQhS9XREubuIrc o8xw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sigma-chemnitz.de header.s=v2012061000 header.b=NuxT56JF; spf=softfail (google.com: domain of transitioning linux-nfs-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=sigma-chemnitz.de Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id c12-20020a056a00248c00b004f7282c437dsi4236109pfv.50.2022.03.16.20.24.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 16 Mar 2022 20:24:24 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-nfs-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@sigma-chemnitz.de header.s=v2012061000 header.b=NuxT56JF; spf=softfail (google.com: domain of transitioning linux-nfs-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=sigma-chemnitz.de Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D59D12DD7E; Wed, 16 Mar 2022 20:23:53 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346550AbiCPMoc (ORCPT + 99 others); Wed, 16 Mar 2022 08:44:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57134 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1355827AbiCPMob (ORCPT ); Wed, 16 Mar 2022 08:44:31 -0400 X-Greylist: delayed 2283 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Wed, 16 Mar 2022 05:43:14 PDT Received: from smtpout-3.cvg.de (smtpout-3.cvg.de [IPv6:2003:49:a034:1067:5::3]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3C2002AC1 for ; Wed, 16 Mar 2022 05:43:13 -0700 (PDT) Received: from mail-mta-2.intern.sigma-chemnitz.de (mail-mta-2.intern.sigma-chemnitz.de [192.168.12.70]) by mail-out-3.intern.sigma-chemnitz.de (8.16.1/8.16.1) with ESMTPS id 22GC5604258385 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=OK) for ; Wed, 16 Mar 2022 13:05:06 +0100 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sigma-chemnitz.de; s=v2012061000; t=1647432306; bh=OWThl9jEsgBbK5Cl0AD8eyxjcVEabLJ19i24HO27nRY=; l=2629; h=From:To:Subject:Date; b=NuxT56JF9Xxk4KnVZumFDnr/8lJ95MBdqe2hq/Ih6KLAN4K5AKy1FWlai0TGpTXUu +boR60xjQYtRs5IAAQRE//sgAE9ErJ9qBcPfjVjneJC6fo4nYW03/Y0C9cluq8PJiy Nz2OUa8xhJUeiWJK2gJo35H5nNlmjU/tM048ajKc= Received: from reddoxx.intern.sigma-chemnitz.de (reddoxx.sigma.local [192.168.16.32]) by mail-mta-2.intern.sigma-chemnitz.de (8.16.1/8.16.1) with ESMTP id 22GC54Fc342147 for from enrico.scholz@sigma-chemnitz.de; Wed, 16 Mar 2022 13:05:05 +0100 Received: from mail-msa-3.intern.sigma-chemnitz.de ( [192.168.12.73]) by reddoxx.intern.sigma-chemnitz.de (Reddoxx engine) with SMTP id 455F18AEB25; Wed, 16 Mar 2022 13:05:03 +0100 Received: from ensc-virt.intern.sigma-chemnitz.de (ensc-virt.intern.sigma-chemnitz.de [192.168.3.24]) by mail-msa-3.intern.sigma-chemnitz.de (8.15.2/8.15.2) with ESMTPS id 22GC53Ub257052 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO) for from ensc@sigma-chemnitz.de; Wed, 16 Mar 2022 13:05:03 +0100 Received: from ensc by ensc-virt.intern.sigma-chemnitz.de with local (Exim 4.94.2) (envelope-from ) id 1nUSOk-0007el-5p for linux-nfs@vger.kernel.org; Wed, 16 Mar 2022 13:05:02 +0100 From: Enrico Scholz To: linux-nfs@vger.kernel.org Subject: Random NFS client lockups Date: Wed, 16 Mar 2022 13:05:02 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Sender: Enrico Scholz X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Hello, I am experiencing random NFS client lockups after 1-2 days. Kernel reports | nfs: server XXXXXXXXX not responding, timed out processes are in D state and only a reboot helps. The full log is available at https://pastebin.pl/view/7d0b345b I can see one oddity there: shortly before the timeouts, log shows at 05:07:28: | worker connecting xprt 0000000022aecad1 via tcp to XXXX:2001:1022:: (port 2049) | 0000000022aecad1 connect status 0 connected 0 sock state 8 All other connects go in EINPROGRESS first | 0000000022aecad1 connect status 115 connected 0 sock state 2 | ... | state 8 conn 1 dead 0 zapped 1 sk_shutdown 1 After 'status 0', rpcdebug shows (around 05:07:43) | --> nfs4_alloc_slot used_slots=03ff highest_used=9 max_slots=30 | <-- nfs4_alloc_slot used_slots=07ff highest_used=10 slotid=10 | ... | <-- nfs4_alloc_slot used_slots=fffffff highest_used=27 slotid=27 | --> nfs4_alloc_slot used_slots=fffffff highest_used=27 max_slots=30 | ... | --> nfs4_alloc_slot used_slots=3fffffff highest_used=29 max_slots=30 | <-- nfs4_alloc_slot used_slots=3fffffff highest_used=29 slotid=4294967295 | nfs41_sequence_process: Error 1 free the slot and nfs server times out then. The server reports nearly at this time | Mar 16 05:02:40 kernel: rpc-srv/tcp: nfsd: got error -32 when sending 112 bytes - shutting down socket Similar message (with other sizes and sometime error -104) appear frequently without a related client lockup. How can I debug this further resp. solve it? It happens (at least) with: - a Fedora 35 client with kernel 5.16.7, kernel-5.17.0-0.rc7.20220310git3bf7edc84a9e.119.fc37.x86_64 and some other between them - a Rocky Linux 8,5 server with kernel-4.18.0-348.12.2 and kernel-4.18.0-348.2.1 Problem started after a power outage were whole infrastructure rebooted. I ran the setup with kernel 5.16.7 on client and 4.18.0-348.2.1 on server without problems before the outage. Issue affects a /home directory mounted with | XXXX:/home /home nfs4 rw,seclabel,nosuid,nodev,relatime,vers=4.2,rsize=262144,wsize=262144,namlen=255,soft,posix,proto=tcp6,timeo=600,retrans=2,sec=krb5p,clientaddr=XXXX:2001:...,local_lock=none,addr=XXXX:2001:1022:: 0 0 Happens also without the "soft" option. There are applications like firefox or chromium running which held locks and access the filesystem frequently. Logfile was created when rpcdebug enabled | nfs proc xdr root callback client mount pnfs pnfs_ld state | rpc xprt call debug nfs auth bind sched trans svcsock svcdsp misc cache Enrico