Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp3764452pxb; Sat, 13 Feb 2021 08:26:24 -0800 (PST) X-Google-Smtp-Source: ABdhPJy4gVTeFoWZ9S2SKVxKtmfSa3AiFO1LiGqNHf+1B9HoNM60EY7WBdun0/7AB8wxPf4r4No2 X-Received: by 2002:a17:907:94d4:: with SMTP id dn20mr8126670ejc.397.1613233584403; Sat, 13 Feb 2021 08:26:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613233584; cv=none; d=google.com; s=arc-20160816; b=FUoOr6QjDHHjNPvXTlBswTnw020SjG6Z4jv44PbJUIBG76+1goajRbEVj1BKODzfku gueDnnw68GOoVZL72gwtZV3OE1sj5IXY128P/+ft5R3cUPpewxLEjpCnJ2WPXoEWmvWR lC7XZ6PFY13TZzqNb7dO/ifJ0YFJTbgm3w071zUQiH1zTAaC6Mb2A5KwBnOqUHnF/Sr0 6c6kEzv5ghLxtMlWyVEWalc/05h0PL1AzLy5KGfDAIwZeLiDYcS81prWGQ6kdKmFLkVy BqnLSadVlagrj/rb/2DTKHMwuuVJd6eNgIpx2PfFhbVW/72tAEiB2yhi+n6iSrsSwbRv JdDg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:message-id:date:emacs :subject:to:from; bh=9+gMdgyk9UJiCypAXEW3UJ3kieJCnsvM2uWeJHBJ6cI=; b=wsMesM8Wdsed7VCtUJ8WBp66Hhz5h9sczx5dgof2YGdbVeyO7C+7pzPVAcSO5GwozJ qaHuLUlfofbr4OGGi5fd6EfXwFsODwh0l+87D15e62xX+zDRj+fWyh6C2u6fGDceek1G 8HcsPp+bTK+BvplkDYzLyONi3LbYYncz4zi7y5y4WZwbYtIP2QIcE92iLO2Lu5rLd5r6 E727mNz/NFp54ankXf7yA0zMlBiSCrFlDcsvjLYmsGZN8hDsyzHUzOdbmq9Kp1dt3pZS 4wXFH6cVGdKPhbq8IM26oZP/ryQ8rKWC5uL8KQU9RoyNGmqF6JmEnK5JmdoHmGp32s/A a7Cg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r25si10866552edc.257.2021.02.13.08.25.50; Sat, 13 Feb 2021 08:26:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229647AbhBMQZo (ORCPT + 99 others); Sat, 13 Feb 2021 11:25:44 -0500 Received: from icebox.esperi.org.uk ([81.187.191.129]:35302 "EHLO mail.esperi.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229574AbhBMQZn (ORCPT ); Sat, 13 Feb 2021 11:25:43 -0500 X-Greylist: delayed 3805 seconds by postgrey-1.27 at vger.kernel.org; Sat, 13 Feb 2021 11:25:43 EST Received: from loom (nix@sidle.srvr.nix [192.168.14.8]) by mail.esperi.org.uk (8.16.1/8.16.1) with ESMTP id 11DFLZXW011636 for ; Sat, 13 Feb 2021 15:21:35 GMT From: Nick Alcock To: NFS list Subject: Re: steam-associated reproducible hard NFSv4.2 client hang (5.9, 5.10) Emacs: because idle RAM is the Devil's playground. Date: Sat, 13 Feb 2021 15:21:35 +0000 Message-ID: <87pn14c7sw.fsf@esperi.org.uk> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-DCC--Metrics: loom 1481; Body=1 Fuz1=1 Fuz2=1 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org (I can't get References: right on this mail due to the original aging out of my mailbox: archive URL, https://www.spinics.net/lists/linux-nfs/msg81430.html). I now have a little lockdep info from this hang (and reports from at least two others that they've seen similar-looking hangs dating back to 4.19, though much harder to reproduce, taking many hours rather than five minutes: in one case they report not using NFS in production any more because of this). Unfortunately the lockdep info isn't much use: Feb 13 14:13:12 silk warning: : [ 888.834464] Showing all locks held in the system: Feb 13 14:13:12 silk warning: : [ 888.834501] 1 lock held by dmesg/1152: Feb 13 14:13:12 silk warning: : [ 888.834508] #0: ffff980c3b7200d0 (&user->lock){+.+.}-{3:3}, at: devkmsg_read+0x49/0x2d1 Feb 13 14:13:12 silk warning: : [ 888.834540] 2 locks held by tee/1322: Feb 13 14:13:12 silk warning: : [ 888.834546] #0: ffff980c0809a430 (sb_writers#12){.+.+}-{0:0}, at: ksys_write+0x6a/0xdc Feb 13 14:13:12 silk warning: : [ 888.834573] #1: ffff980c3ca7b5e8 (&sb->s_type->i_mutex_key#16){++++}-{3:3}, at: nfs_start_io_write+0x1a/0x45 Feb 13 14:13:12 silk warning: : [ 888.834632] 1 lock held by 192.168.16.8-ma/2302: Feb 13 14:13:12 silk warning: : [ 888.834638] #0: ffff980c0fe6b700 (&acct->lock#2){+.+.}-{3:3}, at: acct_process+0x102/0x2bc The first of those is my ongoing dmesg -w. The last is process accounting. The middle one is an ongoing, always-active Xsession-errors tee over the same NFSv4 connection, which says nothing more than that writes to this NFS server from this client have hung, which we already know. There are no signs of locks held by the Steam client which has hung in the middle of installation. So whateverthehell this is, it's not blocked on a lock. The NFS client is hanging all on its own. (I have no idea how clients can block in the middle of writing if a lock is *not* involved somehow, but that is what it looks like from the lockdep output.) Does anyone know how I might start debugging this sod?