Received: by 2002:a05:6a10:af89:0:0:0:0 with SMTP id iu9csp3377978pxb; Mon, 17 Jan 2022 19:03:20 -0800 (PST) X-Google-Smtp-Source: ABdhPJy7ZYZTSnBCawlj/2oZSbQpPUqqUovIMhkhcRYTnH6kwIt6saij5R0RKi5zqB/13guQWfBG X-Received: by 2002:a17:90b:1a91:: with SMTP id ng17mr11840817pjb.55.1642475000504; Mon, 17 Jan 2022 19:03:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1642475000; cv=none; d=google.com; s=arc-20160816; b=MmqepRxwuY5SZyQHwwy6PXu9Zj0iQ9b1zsDaLkKZbBMI5uVEiQ12JqCWGuH+EaMae0 gu1xlmQPVDzgmqIV91EGveNrS3clOlj0F+IRRqoPjwt6p7qELy2L8B8qxDTKzZ2hgV6Y LUmZbA6/VVrAHSeanTh3u2Xrlw69xw0feTJIaG50rClksSsT8E0coEa68V82ffv1gHe3 CK35fYAxTqNeZvXPiaX6JzgNoLeNTDCGPCKciQbTLizMEUNvGB1/i16hFFm0DrfJJg4x 5CDN9gVJZPsggD4t8Gsdv1MgK9w9z/5Dp2RoaVqKXRuYkcHLRO2ZV+OEpmctz1VcRhLt ouvg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=SpFOdHQRb3SLMMJjjBny+Bn/LMYb8sP6Wtk4H/ka0bY=; b=lKnUPMIfixXrPIxZOPcPhU5eXzSLRR3cv1wNqMxAlwaXPuGDf/AgRpiuOgW8usOfSp tBbtsxP99m4WK8cICcvT3hEuEONbM2TuZAPDbpNTTJ08GmvEQ2nZNP7vQ4wQ+2cIHcL1 zl2EFbhpJ/jrs4mFWhKVyiM4ojG8sXfvGLuyFYbayyB7tUARplTYOVtvfNZbg1X5nBe4 67j98iZjY9EHNcckpCmi3kB6/bXnPmytRrjqV+2F/A+bpRueL3osjzFy8uX6aJFBqDf3 jNC0iRoFD38wOI+C9/82h6XtYdnWQDLq1Lkq+Ve3Av/cH853h1xh/B3GfosmTu4J7thk 5SBA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id hk16si226419pjb.67.2022.01.17.19.03.07; Mon, 17 Jan 2022 19:03:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243560AbiAQWJK (ORCPT + 99 others); Mon, 17 Jan 2022 17:09:10 -0500 Received: from server.atrad.com.au ([150.101.241.2]:41370 "EHLO server.atrad.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243559AbiAQWJI (ORCPT ); Mon, 17 Jan 2022 17:09:08 -0500 Received: from marvin.atrad.com.au (IDENT:1008@marvin.atrad.com.au [192.168.0.2]) by server.atrad.com.au (8.17.1/8.17.1) with ESMTPS id 20HM8qtp007107 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO); Tue, 18 Jan 2022 08:38:53 +1030 Date: Tue, 18 Jan 2022 08:38:52 +1030 From: Jonathan Woithe To: Chuck Lever III Cc: Bruce Fields , Linux NFS Mailing List Subject: Re: [Bug report] Recurring oops, 5.15.x, possibly during or soon after client mount Message-ID: <20220117220851.GA8494@marvin.atrad.com.au> References: <20220114103901.GA22009@marvin.atrad.com.au> <20220115081420.GB8808@marvin.atrad.com.au> <927EED04-840E-4DA6-B2B1-B604A7577B4E@oracle.com> <20220115212336.GB30050@marvin.atrad.com.au> <20220116220627.GA19813@marvin.atrad.com.au> <1E71316C-9EE8-4C71-ADA1-71E2910CA070@oracle.com> <20220117074430.GA22026@marvin.atrad.com.au> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220117074430.GA22026@marvin.atrad.com.au> User-Agent: Mutt/1.10.1 (2018-07-13) X-MIMEDefang-action: accept X-Scanned-By: MIMEDefang 2.86 on 192.168.0.1 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Mon, Jan 17, 2022 at 06:14:30PM +1030, Jonathan Woithe wrote: > > >>>>> A possible culprit is 7f024fcd5c97 ("Keep read and write fds with each > > >>>>> nlm_file"), which was introduced in or around v5.15. You could try a > > >>>>> simple test and back the server down to v5.14.y to see if the problem > > >>>>> persists. > > > > > > FYI I have now put the kernel.org 5.14.21 kernel on the affected system and > > > booted it. Since the oops has taken between 1 and 2 weeks to be triggered > > > in the past, we may have to wait a few weeks to be certain of an outcome. > > > If there's anything else you need from me in the interim please ask. > > > > If you identify a particular client that triggers the issue, it would be > > helpful to know: > > > > - The client's kernel version > > - What was running on the client before it was shut down > > - Whether the application and client shut down was clean > > I have been able to identify the client involved. It was the same client > on both occasions. That client is running the 4.4.14 kernel. > : > I will ask the user if they remember anything happening differently on the > days of the server oops. I have asked the user, and certainly in the case of the most recent oops the previous day's usage (that is, the day of the unclean shutdown, the day before the boot which triggered the server oops) was nothing out of the ordinary. Firefox, thunderbird and libreoffice were the only applications used, with the desktop file browser also getting an outing. The desktop is xfce4. These programs would have been used variously over the course of the day (roughly 7.5 hours on this particular date). > With the server running 5.14.21, I did a reset of the client (that is, > unclean shutdown) just before I left this evening. The server did not oops > when the client was rebooted a minute or so later. I will see if I can > repeat the test with 5.15.12 tomorrow morning before others get in if you > think that will be helpful in light of the above observations. I did this test this morning before others came in. The server (with 5.15.12 running) did not oops. However, with the recent mention of locking this may not be surprising since no NFS locking had been attempted on the client during the test (mainly because I had no easy way to elicit a lock). I merely booted the client, reset it and let it boot again. During the course of the day the client will run firefox, thunderbird and libreoffice, all of which probably involve locking of various descriptions. Thus a test without locking is perhaps not perfect. I am happy to run further tests if it will help. Let me know if I can do anything else. Regards jonathan