Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp4286010pxk; Tue, 22 Sep 2020 15:40:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy2897kcVkTBIsynNMW8aFX8PsfOCNUPNHAAK+OQWscACqskRLT79Ju938HjXLf5PykvJiw X-Received: by 2002:a50:8e17:: with SMTP id 23mr6449698edw.31.1600814404458; Tue, 22 Sep 2020 15:40:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600814404; cv=none; d=google.com; s=arc-20160816; b=lwDlefqnZmP4fO8NLrszcVGubITqJ2K8vb8y64L2zrT6OIpIryBxi1DlRNreX4svhv Z1lzU16HMQvWiHfh+JH1NL8IwAZZg0GXYirwQ1VmJRX9S6vrNJbw7n7zV3Q4G7vs3okb 3JrIkaCYrtyPYhMyofKUNRY2Wjca+iCBaxua+4GMnmkj3pl+97Qbl33+xJymzdHbd7r2 PkZbeaP2rUnSlu1/4BqFaSA8ZMr88fyif2MGwtTiSxR9lVVz6cedEJVTPwA0V3BAeXe9 Fom07Y/lbdElxt9Dy5lK01S91NunI0HuSFe+7VnBfXYwIcoAWMogpGTgsu5VLZEmHjLh OO9A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=yoBt3Zlizt3jkCWd7ZrtZzUJUwffDhRTU35AUiJAwpQ=; b=cvbLE2GJ2AiIRYSVQ4uniJwOjC6IVavvyHgSX+G7btJeaUGMVcFzwJMRJb8NBUM5Ih LKxWBRGIS41rX/W7UXre0fb6Pmkgy7KzWh/kgt4utRsQ7ryUkoWSPFPrXs3duZcISODb AqLoeb0YPdDbfr/cjTOWGSr0WnpnfjUL66oWtwJR32bPgSOebYln/12sG2+1bCccGW3S 03kMUSwTJJX05lBnMk/czajTnGWC8inqTseNkm0Lr4eNt+ZFk6ZWMBfq3p9e4/akefIe LBm9ce5Od9Xtxy96Kipaf4PQUNjP9QE14SgxgwNT4yg4+KQi7ogOiSDflrCmF1/JOGul hUNw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SNUOR7yY; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id f4si11555675ejw.300.2020.09.22.15.39.09; Tue, 22 Sep 2020 15:40:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=SNUOR7yY; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726617AbgIVWi1 (ORCPT + 99 others); Tue, 22 Sep 2020 18:38:27 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:52411 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726548AbgIVWi1 (ORCPT ); Tue, 22 Sep 2020 18:38:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600814305; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=yoBt3Zlizt3jkCWd7ZrtZzUJUwffDhRTU35AUiJAwpQ=; b=SNUOR7yYSS3B3SdMKUK2sNGmAd7iMaimCEh4S4gxND3DCuHQkerZgmD50Rtes9HdlVBNp9 5UFVFoiJs/YEoE302WhdOwDtLQVIf1PajgIKk49y++HV0sE0gR28/ZYYkd7myFpprna5Y+ fbAvYSGV0fbMckpl0TQ1bcPf6gqnvKo= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-541-aQBF04O7MEShijfob9TMIw-1; Tue, 22 Sep 2020 18:38:23 -0400 X-MC-Unique: aQBF04O7MEShijfob9TMIw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9A9371005E6D; Tue, 22 Sep 2020 22:38:22 +0000 (UTC) Received: from [172.16.176.1] (ovpn-64-66.rdu2.redhat.com [10.10.64.66]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 0093D60BF4; Tue, 22 Sep 2020 22:38:21 +0000 (UTC) From: "Benjamin Coddington" To: "Trond Myklebust" Cc: linux-nfs@vger.kernel.org, anna.schumaker@netapp.com Subject: Re: [PATCH 1/2 v2] NFSv4: Fix a livelock when CLOSE pre-emptively bumps state sequence Date: Tue, 22 Sep 2020 18:38:20 -0400 Message-ID: <5A0929F2-2420-4834-BC83-F7A97297C1B3@redhat.com> In-Reply-To: References: <75089C5A-31C2-4BF2-AD01-BC9D923FCE4B@redhat.com> MIME-Version: 1.0 Content-Type: text/plain X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On 22 Sep 2020, at 17:46, Trond Myklebust wrote: > On Tue, 2020-09-22 at 17:15 -0400, Trond Myklebust wrote: >> On Tue, 2020-09-22 at 17:08 -0400, Benjamin Coddington wrote: >>> On 22 Sep 2020, at 15:48, Trond Myklebust wrote: >>> >>>> On Tue, 2020-09-22 at 15:15 -0400, Benjamin Coddington wrote: >>>>> Since commit 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID >>>>> in >>>>> CLOSE/OPEN_DOWNGRADE") the following livelock may occur if a >>>>> CLOSE >>>>> races >>>>> with the update of the nfs_state: >>>>> >>>>> Process 1 Process 2 Server >>>>> ========= ========= ======== >>>>> OPEN file >>>>> OPEN file >>>>> Reply OPEN (1) >>>>> Reply OPEN (2) >>>>> Update state (1) >>>>> CLOSE file (1) >>>>> Reply OLD_STATEID (1) >>>>> CLOSE file (2) >>>>> Reply CLOSE (-1) >>>>> Update state (2) >>>>> wait for state change >>>>> OPEN file >>>>> wake >>>>> CLOSE file >>>>> OPEN file >>>>> wake >>>>> CLOSE file >>>>> ... >>>>> ... >>>>> >>>>> As long as the first process continues updating state, the >>>>> second >>>>> process >>>>> will fail to exit the loop in >>>>> nfs_set_open_stateid_locked(). This >>>>> livelock >>>>> has been observed in generic/168. >>>>> >>>>> Fix this by detecting the case in >>>>> nfs_need_update_open_stateid() >>>>> and >>>>> then exit the loop if: >>>>> - the state is NFS_OPEN_STATE, and >>>>> - the stateid sequence is > 1, and >>>>> - the stateid doesn't match the current open stateid >>>>> >>>>> Fixes: 0e0cb35b417f ("NFSv4: Handle NFS4ERR_OLD_STATEID in >>>>> CLOSE/OPEN_DOWNGRADE") >>>>> Cc: stable@vger.kernel.org # v5.4+ >>>>> Signed-off-by: Benjamin Coddington >>>>> --- >>>>> fs/nfs/nfs4proc.c | 27 +++++++++++++++++---------- >>>>> 1 file changed, 17 insertions(+), 10 deletions(-) >>>>> >>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c >>>>> index 6e95c85fe395..9db29a438540 100644 >>>>> --- a/fs/nfs/nfs4proc.c >>>>> +++ b/fs/nfs/nfs4proc.c >>>>> @@ -1588,18 +1588,25 @@ static void >>>>> nfs_test_and_clear_all_open_stateid(struct nfs4_state *state) >>>>> static bool nfs_need_update_open_stateid(struct nfs4_state >>>>> *state, >>>>> const nfs4_stateid *stateid) >>>>> { >>>>> - if (test_bit(NFS_OPEN_STATE, &state->flags) == 0 || >>>>> - !nfs4_stateid_match_other(stateid, &state- >>>>>> open_stateid)) { >>>>> - if (stateid->seqid == cpu_to_be32(1)) >>>>> + bool state_matches_other = >>>>> nfs4_stateid_match_other(stateid, >>>>> &state->open_stateid); >>>>> + bool seqid_one = stateid->seqid == cpu_to_be32(1); >>>>> + >>>>> + if (test_bit(NFS_OPEN_STATE, &state->flags)) { >>>>> + /* The common case - we're updating to a new >>>>> sequence >>>>> number */ >>>>> + if (state_matches_other && >>>>> nfs4_stateid_is_newer(stateid, &state->open_stateid)) { >>>>> + nfs_state_log_out_of_order_open_stateid >>>>> (state, >>>>> stateid); >>>>> + return true; >>>>> + } >>>>> + } else { >>>>> + /* This is the first OPEN */ >>>>> + if (!state_matches_other && seqid_one) { >>>> >>>> Why are we looking at the value of state_matches_other here? If >>>> the >>>> NFS_OPEN_STATE flags is not set, then the state->open_stateid is >>>> not >>>> initialised, so how does it make sense to look at a comparison >>>> with >>>> the >>>> incoming stateid? >>> >>> You're right - it doesn't make sense. I failed to clean it up out >>> of >>> my >>> decision matrix. I'll send a v3 after it gets tested overnight. >>> >>> Thanks for the look, and if you have another moment - what do you >>> think >>> about not bumping the seqid in the CLOSE/OPEN_DOWNGRADE path on >>> OLD_STATEID >>> if the state's refcount > 1? This would optimize away a lot of >>> recovery >>> for >>> races like this, but I wonder if it would be possible to end up >>> with >>> two >>> OPEN_DOWNGRADEs dueling that would never recover. >>> >> >> It would lead to a return of the infinite loop in cases where the >> client misses a reply to an OPEN or CLOSE due to a soft timeout >> (which >> is an important use case for us). I hadn't thought about that, but doesn't the task return and decrement the refcount on nfs4_state? > In principle, RFC5661 says that the client is supposed to process all > stateid operations in the order dictated by the seqid. The one problem > with that mandate is that open-by-filename doesn't allow us to know > whether or not a seqid bump is outstanding. This is why we have the 5 > second timeout wait in nfs_set_open_stateid_locked(). > > We could do something similar to that for CLOSE by setting the seqid to > 0, and then ensuring we process the CLOSE in the order the server ends > up processing it. Unfortunately, that would require us to replay any > OPEN calls that got shut down because we used seqid 0 (it would also > not work for NFSv4.0)... So perhaps the best solution would be to > instead allow CLOSE to wait for outstanding OPENs to complete, just > like we do in nfs_set_open_stateid_locked()? Thoughts? How can you know if there are outstanding OPENs? I thought that was the whole intention of the CLOSE -> OLD_STATEID retry bump -- you cannot know if the server was able to process an interrupted OPEN. I had a dumb patch that just caused CLOSE to delay a bit before retrying if it received OLD_STATEID, but I figured that was vulnerable to another livelock where another process could just keep doing open(), close() as well. Ben