Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp4160001pxk; Tue, 22 Sep 2020 11:47:58 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyb2d0dSo2tSR8Vh5daHQRx7o4IjZdobiZeLqrELg7MtIbjrF+jwvcxEzjRgc1n2yT/WC80 X-Received: by 2002:aa7:cb8f:: with SMTP id r15mr5566967edt.356.1600800478572; Tue, 22 Sep 2020 11:47:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600800478; cv=none; d=google.com; s=arc-20160816; b=XmUH6VZYF0eVrB0b9o77MKW2hMSmzw6onxdreaGDKm3kiZARFrt4cEPbqltqXGhwM4 xeA/AyF3MkTNtmdLXv5QGqRVEneOSNi4cpaCJ7D2ZRr0mYu5QeGgPfsKV/OSuMC6slal 73eLkaUaobc3IpgifDh0Fja0MSVSk0hwwukwPeue6cZE+HCEjdP/JjyRWggHZFxsNkMi zmjcJ1g2W1j5Y/jj9cNIGXZaJxbQdb2l89UDp05zDvgmcBcEkzwjn/IN4K632/Oe/laG 3iDOVoQmNVPPk3PbJc125utkpfOtZiKSbQq4FoJuCvG6Md25tLT5IzRWZ7dAEO8E3gqu dzoQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=r0FB5loLfqOnMuDUjVPQGW+J3kXQfMd9WPTfDBece+4=; b=odWFaPtLazd8LWapnWnB1PvT73FgH8Sk1Q8DIju+1zOgr8XfmpUms28VQ6zPpqimbU PMihghdAa/1wPnSXtGcGOcbfYFS2FpPrTnsxs6YQ/Q/iCL4inMoFgkkrvIwmK3pgph6L n+Niz6HVa86k2ri7tTfnmq11LtgKkfV0zlQ6MxtwdkknWy5aNKyEAz5koiwErAHbfr/a gvu61JXda9ELZf77QFWUjXWY9YBJGMo7JRfJskMdCSoUuPKn3oIRGz7BG5AOGRIcAYIy Viuvr2UwKc5g5L6v0rZSGNVoj4j8v8g+oSH9IN+f+6glPXQ8JVosvJK+8gR9g9u5ugbu uiSQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=BZcmTWUR; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v24si10715853edw.458.2020.09.22.11.47.20; Tue, 22 Sep 2020 11:47:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=BZcmTWUR; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726625AbgIVSrS (ORCPT + 99 others); Tue, 22 Sep 2020 14:47:18 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:51503 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726563AbgIVSrR (ORCPT ); Tue, 22 Sep 2020 14:47:17 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600800436; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=r0FB5loLfqOnMuDUjVPQGW+J3kXQfMd9WPTfDBece+4=; b=BZcmTWURwDKdIPjt6TaBbJ2pyxvWdrofF6Q9ug/JtFZUTepqcENKnJMW1hHEO92Pm8f+xY X7eT7lVV5noN45Zr4H8W/zunYU+FNhfYh6H9ihfJmK7XroG0yWyCjXMPUGalGyoyKcTH4C izKb9nYEZzrgCpLEDjqLziq238nLn7E= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-295-hiraVGOxP4i9XVWIPPaRIw-1; Tue, 22 Sep 2020 14:47:14 -0400 X-MC-Unique: hiraVGOxP4i9XVWIPPaRIw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B15731007464; Tue, 22 Sep 2020 18:47:11 +0000 (UTC) Received: from [172.16.176.1] (ovpn-64-66.rdu2.redhat.com [10.10.64.66]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 269983782; Tue, 22 Sep 2020 18:47:10 +0000 (UTC) From: "Benjamin Coddington" To: "Anna Schumaker" Cc: "Trond Myklebust" , "Linux NFS Mailing List" Subject: Re: [PATCH 1/3] NFSv4: Fix a livelock when CLOSE pre-emptively bumps state sequence Date: Tue, 22 Sep 2020 14:47:09 -0400 Message-ID: <068EFB54-D0B0-42C2-9408-603F10918FD7@redhat.com> In-Reply-To: References: <5a7f6bbf4cf2038634a572f42ad80e95a8d0ae9c.1600686204.git.bcodding@redhat.com> <8DB79D4D-6986-4114-B031-43157089C2B5@redhat.com> MIME-Version: 1.0 Content-Type: text/plain X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On 22 Sep 2020, at 12:11, Anna Schumaker wrote: > On Tue, Sep 22, 2020 at 11:53 AM Anna Schumaker > wrote: >> >> On Tue, Sep 22, 2020 at 11:49 AM Benjamin Coddington >> wrote: >>> >>> On 22 Sep 2020, at 10:43, Anna Schumaker wrote: >>> >>>> On Tue, Sep 22, 2020 at 10:31 AM Anna Schumaker >>>> wrote: >>>>> >>>>> On Tue, Sep 22, 2020 at 10:22 AM Benjamin Coddington >>>>> wrote: >>>>>> >>>>>> On 22 Sep 2020, at 10:03, Anna Schumaker wrote: >>>>>>> Hi Ben, >>>>>>> >>>>>>> Once I apply this patch I have trouble with generic/478 doing lock >>>>>>> reclaim: >>>>>>> >>>>>>> [ 937.460505] run fstests generic/478 at 2020-09-22 09:59:14 >>>>>>> [ 937.607990] NFS: __nfs4_reclaim_open_state: Lock reclaim failed! >>>>>>> >>>>>>> And the test just hangs until I kill it. >>>>>>> >>>>>>> Just thought you should know! >>>>>> >>>>>> Yes, thanks! I'm not seeing that.. I've tested these based on >>>>>> v5.8.4, I'll >>>>>> rebase and check again. I see a wirecap of generic/478 is only 515K >>>>>> on my >>>>>> system, would you be willing to share a capture of your test >>>>>> failing? >>>>> >>>>> I have it based on v5.9-rc6 (plus the patches I have queued up for >>>>> v5.10), so there definitely could be a difference there! I'm using a >>>>> stock kernel on my server, though :) >>>>> >>>>> I can definitely get you a packet trace once I re-apply the patch and >>>>> rerun the test. >>>> >>>> Here's the packet trace, I reran the test with just this patch applied >>>> on top of v5.9-rc6 so it's not interacting with something else in my >>>> tree. Looks like it's ending up in an NFS4ERR_OLD_STATEID loop. >>> >>> Thanks very much! >>> >>> Did you see this failure with all three patches applied, or just with >>> the >>> first patch? >> >> I saw it with the first patch applied, and with the first and third >> applied. I initially hit it as I was wrapping up for the day >> yesterday, but I left out #2 since I saw your retraction > > I reran with all three patches applied, and didn't have the issue. So > something in the refactor patch fixes it. That helped me see the case we're not handling correctly is when two OPENs race and the second one tries to update the state first and gets dropped. That is fixed by the 2/3 refactor patch since the refactor was being a bit more explicit. That means I'll need to fix those two patches and send them again. I'm very glad you caught this! Thanks very much for helping me find the problem. Ben