Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp109322ybi; Wed, 29 May 2019 17:41:39 -0700 (PDT) X-Google-Smtp-Source: APXvYqwUNapsLbSM5hbsW09GZON3dViqwNrpp3gs/x2qqUlgQF3dDwM3o57vMFaEycUKFXELY4jq X-Received: by 2002:a63:fc08:: with SMTP id j8mr898551pgi.432.1559176899708; Wed, 29 May 2019 17:41:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559176899; cv=none; d=google.com; s=arc-20160816; b=Hs4fnXMK9uFPZgzA+r+/jRRQFANq3isnSd7fZyjg0fiyHZaOBJW8aUerODAYEJQNo9 y4s+sk5EuxYL/aiO1Ylc4qCpr/0hVO12ditlGu/XOfPa3e/tAHrWsPlRf2eRJkdlzJil jaYDnxrK7dJiEAkB6e4cDiu0UeLJBeJhc5wd/UM2zxD2glLRRwRDjyXo3dZFyoQTvlfB SAMNAOneXZpCcd/PNrVFZNvqQCCeONfn1W8PK50rrJgjweoVuf8B5EhDIrO0hPPSzqQi m8dA9Hu1L5Qd4NEW18D9AqXy5lZnBl9wNyB9PlTOVL2BNGHKCAL/6yucjJcFLTS4RhYN 9iOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:to:from:date :dkim-signature:dkim-filter; bh=iuloNs8wIVZA7gyks+IViCH3L+wBferKpveIH3UwHw0=; b=zn5mudBv2L/DGqZg8xWtV34lV4tQHEXwMEU52SulWoLFwC9/UmnYapcFAZeJDIBrF4 NhIx9Kt/ZLTZ7VrzgK2IsvCdrL42AXT94PUfvy7tq+POi+RdqjUXpo3B/4GhyPwo7n4T wlaCKnsp8qYADfI/0/7Lzk9cwVfd3LdlxDYwsMdP85yQAIPdFKyH5SuWOnxCvNlaf1rZ XYeqz9MqtKKvwwugO8dlIAkA88RlTD73At+zyTxn4F19dYRykV2RXtLsTFx2/5Sk+ghp hTuVeoJCd85ocHheNbQniRVgEl1oLWyNmiyB+m3EbX/DVj4kWZSqBoG/nWtXFmOEG+2+ NVPA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@prgmr.com header.s=default header.b=aOXJi5lJ; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l59si1183265pjb.38.2019.05.29.17.41.09; Wed, 29 May 2019 17:41:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@prgmr.com header.s=default header.b=aOXJi5lJ; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726617AbfE3Akn (ORCPT + 99 others); Wed, 29 May 2019 20:40:43 -0400 Received: from mail.prgmr.com ([71.19.149.6]:60060 "EHLO mail.prgmr.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726527AbfE3Akn (ORCPT ); Wed, 29 May 2019 20:40:43 -0400 Received: from turtle.mx (96-92-68-116-static.hfc.comcastbusiness.net [96.92.68.116]) (Authenticated sender: adp) by mail.prgmr.com (Postfix) with ESMTPSA id 0605628C00B for ; Thu, 30 May 2019 01:38:26 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.11.0 mail.prgmr.com 0605628C00B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=prgmr.com; s=default; t=1559194707; bh=iuloNs8wIVZA7gyks+IViCH3L+wBferKpveIH3UwHw0=; h=Date:From:To:Subject:References:In-Reply-To:From; b=aOXJi5lJQoYYWMwyquKSIdpgSTujrLf5lJRsG59NM+GkdZZWS012mWbUj1oParj6A Drx6ZzZuOuZX0U6DbkxIVbFFmSPkXp+jp/2vtmGwJC0VUI0R07Mc2x47ll0kizQmZi qFXLRcSp8CCVwyriknB4UXcZs1Wm1QbPRyzHWo8w= Received: (qmail 18344 invoked by uid 1353); 30 May 2019 00:41:46 -0000 Date: Wed, 29 May 2019 18:41:46 -0600 From: Alan Post To: "linux-nfs@vger.kernel.org" Subject: Re: User process NFS write hang followed by automount hang requiring reboot Message-ID: <20190530004146.GV4158@turtle.email> References: <20190520223324.GL4158@turtle.email> <20190524173155.GQ4158@turtle.email> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190524173155.GQ4158@turtle.email> Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Fri, May 24, 2019 at 11:31:55AM -0600, Alan Post wrote: > On Tue, May 21, 2019 at 03:46:03PM +0000, Trond Myklebust wrote: > > Have you tried upgrading to 4.19.44? There is a fix that went in not > > too long ago that deals with a request leak that can cause stack traces > > like the above that wait forever. > > > > Following up on this. I have set aside a rack of machines and put > Linux 4.19.44 on them. They ran jobs overnight and will do the > same over the long weekend (Memorial day in the US). Given the > error rate (both over time and over submitted jobs) we see across > the cluster this well be enough time to draw a conclusion as to > whether 4.19.44 exhibits this hang. > In the six days I've run Linux 4.19.44 on a single rack, I've seen no occurrences of this hang. Given the incident rate for this issue across the cluster over the same period of time, I would have expected to see one on two incidents on the rack running 4.19.44. This is promising--I'm going to deploy 4.19.44 to another rack by the end of the day Friday May 31st and hope for more of the same. I wondered upthread whether the following commits were what you had in mind when you asked about 4.19.44: 63b0ee126f7e: NFS: Fix an I/O request leakage in nfs_do_recoalesce be74fddc976e: NFS: Fix I/O request leakages Confirming that it is these patches and no others has become topical for me: my upstream is now providing a 4.19.37 build, and I note these two patches are included since 4.19.31 and so are presumably in my now-available upstream 4.19.37 build. If I could trouble you to confirm whether or not this is the complete set of patches you had in mind for the 4.19 branch after 4.19.28 when you recommended I try 4.19.44 I would appreciate it. Lurking on the list for the past week or two and watching everyone's work has been inspiring. Thank you again. I'll report back no later than next week. -A -- Alan Post | Xen VPS hosting for the technically adept PO Box 61688 | Sunnyvale, CA 94088-1681 | https://prgmr.com/ email: adp@prgmr.com