Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp733360ybi; Fri, 24 May 2019 10:32:09 -0700 (PDT) X-Google-Smtp-Source: APXvYqwRVd1n5zlTwAQ+oKfYAp9T5t2lV1WCB4V82+8lKrJ7CzLQeX1b+8sq4VJt0m6/nJr1lUiS X-Received: by 2002:aa7:9212:: with SMTP id 18mr42413015pfo.120.1558719129367; Fri, 24 May 2019 10:32:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558719129; cv=none; d=google.com; s=arc-20160816; b=zBYZI5vGTXO0Isz72fUVM6lGGaoHSglqkfNHzH2flGF4zZESW8nsROwhiBAanWbzZy 5LozYwKm2/oDM2j8wIqsokX/lwf16uvY1A2Czq9ThmTrtAFlv3fpvmHR4aca+EA8IgKd EAquB2ZcmV+P7ho8mCTzv4YdvyZh7lwWquAWfZ0L/vOZL3aowLpSTPdL3/5OyUa8GsQl 6eeZ077DLdWKsnRH20BOWXOt/wuRHq/qU3V3mo30lalWBD1xYRw32jcf2P06fH/5aK10 2wBDRDJmBiRbuUO++3HUcK+pWG69YkB3nU9nwPkN/jcbG/Yvgj/tkJoLKflqBuuaYofj rDJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:to:from:date :dkim-signature:dkim-filter; bh=B8NWOCzJlm1HRZkKHmpCbWhdPAatcPClz7Z7+IniQvY=; b=PAuSVfN0RydJbJJJ7+3UgfGh1eeFQOMvOhgoS908q1v0XO9lND08liGEAN4WRCQBwP as7YCEeGKWvUuTj/AgmV97ursO6Jeyg3yJ45CtKxIYT1QP5bZ7Yqtnn9w1CaFCBB9gga knPJ3rBhENHhqwUwF+W1AKSgMoH/sQ7R9Hung5xFbWjg07OthXN4aM85x3LaKBdya3f3 ekVFh5M2qgWmW4U4NHoSHaEjYBsSbe4t0zxNpmuR3zt/TVsg3vL/AbhNoMXixWSEZrJV 9MqOa/5JHHcB4SKZTOmjOg5Em1nU64rzLxDHHwCRSSbN1yTx+CQOCMDT/2THvGkw6h1t izuA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@prgmr.com header.s=default header.b=Aymnu9JU; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id go3si4712916plb.423.2019.05.24.10.31.44; Fri, 24 May 2019 10:32:09 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@prgmr.com header.s=default header.b=Aymnu9JU; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726674AbfEXRbA (ORCPT + 99 others); Fri, 24 May 2019 13:31:00 -0400 Received: from mail.prgmr.com ([71.19.149.6]:43120 "EHLO mail.prgmr.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725923AbfEXRa7 (ORCPT ); Fri, 24 May 2019 13:30:59 -0400 Received: from turtle.mx (96-92-68-116-static.hfc.comcastbusiness.net [96.92.68.116]) (Authenticated sender: adp) by mail.prgmr.com (Postfix) with ESMTPSA id 7F5E528C001 for ; Fri, 24 May 2019 18:28:50 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.11.0 mail.prgmr.com 7F5E528C001 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=prgmr.com; s=default; t=1558736930; bh=B8NWOCzJlm1HRZkKHmpCbWhdPAatcPClz7Z7+IniQvY=; h=Date:From:To:Subject:References:In-Reply-To:From; b=Aymnu9JU+3N3reG/eRj4KERsOeQItiaMQODA7uRjCGjnBtPSxZU9aeYDZyCJUiH3l Cq+fTQ+1XOj6HEiPaNXG7Y+LF05JdNqgQFUulMdgCpi26eZq26FldNrceJzCULPgQT s5l0l6q6U2B1HazIQs0/sPlPcFzZLD0NCs4EbxVQ= Received: (qmail 12494 invoked by uid 1353); 24 May 2019 17:31:55 -0000 Date: Fri, 24 May 2019 11:31:55 -0600 From: Alan Post To: "linux-nfs@vger.kernel.org" Subject: Re: User process NFS write hang followed by automount hang requiring reboot Message-ID: <20190524173155.GQ4158@turtle.email> References: <20190520223324.GL4158@turtle.email> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Tue, May 21, 2019 at 03:46:03PM +0000, Trond Myklebust wrote: > Have you tried upgrading to 4.19.44? There is a fix that went in not > too long ago that deals with a request leak that can cause stack traces > like the above that wait forever. > Following up on this. I have set aside a rack of machines and put Linux 4.19.44 on them. They ran jobs overnight and will do the same over the long weekend (Memorial day in the US). Given the error rate (both over time and over submitted jobs) we see across the cluster this well be enough time to draw a conclusion as to whether 4.19.44 exhibits this hang. Other than stack traces, what kind of information could I collect that would be helpful for debugging or describing more precisely what is happening to these hosts? I'd like to exit from the condition of trying different kernels (as you no doubt saw in my initial message I've done a lot of it) and enter the condition of debugging or reproducing the problem. I'll report back early next week and appreciate your feedback, -A -- Alan Post | Xen VPS hosting for the technically adept PO Box 61688 | Sunnyvale, CA 94088-1681 | https://prgmr.com/ email: adp@prgmr.com