Received: by 2002:a05:6602:18e:0:0:0:0 with SMTP id m14csp5861010ioo; Wed, 1 Jun 2022 14:19:48 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzbgywRqJBJxCZUnWN3R2v9fn3Sv4DGbX1lClWfCPss8WVFxU71z5UW9OGR5pncp+LunipY X-Received: by 2002:a17:902:778e:b0:162:2cf7:28be with SMTP id o14-20020a170902778e00b001622cf728bemr1392506pll.0.1654118388417; Wed, 01 Jun 2022 14:19:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1654118388; cv=none; d=google.com; s=arc-20160816; b=KWO51I5vQI9wLawZZk96vIRhF4nGUWKNxLz0o10J2fJzEQUlWGvCv35AIx6/TgCqRQ a5aWDx+XmQYwTOU0hmmsPYXOfD6FFDUDGnsGQZmc6Kdl2rzYm9oJf13Cv/3qUN26Kg8O 798lfvgVTJTdKLNPi465PfKhhBo2Cf6pl0s5jlKF0ASsYOYkhdTeobtaIx/2TGFgVuGS LK9S77/OIwxLbRmr7WNLzRcx8CuCgaSHoR7xBy6NbiEpKenV0ixX3JGud8CmXLLBrUvt WXQ5ILPvsyDlIz9k6oRO2pH88vf/ETvyS4tygDKfGossvE+/DJUbi10QGfDpBzCRqVsT 2fkQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=PYLWJkHdUUIJ6iQ4eafOViUd+wC2V22z8U4Kbj4sAP8=; b=dspW3HmtHvhvf7nsAO6NdQS97BsBXv03xfcoQEApkuqSYdC+zpNZ20D/R5Ymo/HhH/ RqlxvU/Nuk1ftNy4/4CAGHEgqKxCiUwX2Ug/2pkWFWujg+Pkld7Nk6yiLDHmJrmzGkIk Z/5lMrISMgm0lOvK5msEgELnoEgw0S4r05KpHTDdQbB3of37CCq4znqCPsmvOKarTRex L8AqnhESJFfxaqFJa3D+izAYDHDc2nLOPL/SG9mt+iTqp0ghacm75tDrmiDpxz1W5nKN 5QLuy8kU5qFmQ6U/k16djvdE10/O8hcbx3WWwvkp/aG4Po2Yvkq6e/Ch3RPXXcWy70yU uuqQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=mJ1DByF4; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id m4-20020a637104000000b003fc97d5f8besi3448082pgc.248.2022.06.01.14.19.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Jun 2022 14:19:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=mJ1DByF4; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 762672ED901; Wed, 1 Jun 2022 13:10:35 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344828AbiEaNtD (ORCPT + 99 others); Tue, 31 May 2022 09:49:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42124 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233818AbiEaNtB (ORCPT ); Tue, 31 May 2022 09:49:01 -0400 Received: from mail-qk1-x72c.google.com (mail-qk1-x72c.google.com [IPv6:2607:f8b0:4864:20::72c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3B2C21C136 for ; Tue, 31 May 2022 06:49:00 -0700 (PDT) Received: by mail-qk1-x72c.google.com with SMTP id l82so13198645qke.3 for ; Tue, 31 May 2022 06:49:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=PYLWJkHdUUIJ6iQ4eafOViUd+wC2V22z8U4Kbj4sAP8=; b=mJ1DByF4E5TCa+by3Sd7J0fuE3NYLGKhkQFx4Sn0eNAMv+OQdU/gncUY0in6SV6Y3l L6l0jfAgl9EnWgE/67YroAlWQN/5prcBwX1p2zf/Y62GiT2KIXjRur9LlauOwrwJlpby GpeKcnFLwVjheAgyuU43XZ27hOcR9JAV/GyEshL/+19BtGYGHQKm7b6Waak+JMKpa0gG TibKBQu9bGcImncjW/MG1bATqbi1li8pvmHZxXAHDEX029GLQdaLtSpuWYdRMWHQlvll pMk3+ptDw8aQ7SMfgS6WMt9a6ToWU1GCR5ry+UXkPOjFpuXX8e52M3QKts4mNTo/W9kf dyKg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=PYLWJkHdUUIJ6iQ4eafOViUd+wC2V22z8U4Kbj4sAP8=; b=b5MU4xtoj9zDCgukV7qhxg1b4L5far1dktfz5uyV1H84bxqzSro+xDz4BB9kgJxIyQ fjDiKnG9Mc7iLxGS6XDGzK+IWk7KPz2go3gnVxucO7so7cLByME2nhPKCMPO6Okymgea EeHoDRwSzhH8IOSDQrX4kb4bOd6nx0UAOZifXx0+qmP6EnUrK5KejXTM1BT9ytjtJZhA 4A79wNM+5xSV10lFPWGztoMojCN4Ej1wf9Nyaw/QNHx+lSOTtAwj/EU+KZf1jLSYwda1 2f7Jz6PJttv6QY5RXftP3RlE0lojy5mvwTtSO8yled5MmDnkd8PAwSEV6u56Aiu49nf/ SFiA== X-Gm-Message-State: AOAM532jL/6/lji+AJw2lo/qiEhab9c8GOjKzc/5hUIECr141/sk5Ok9 2nYeUWIkWLp0wFsOu9bVOUAIvoVy1jl+Jw== X-Received: by 2002:a37:a1d0:0:b0:6a3:647a:675e with SMTP id k199-20020a37a1d0000000b006a3647a675emr33013452qke.399.1654004939257; Tue, 31 May 2022 06:48:59 -0700 (PDT) Received: from kolga-mac-1.vpn.netapp.com ([2600:1700:6a10:2e90:8432:9f8c:1f08:792a]) by smtp.gmail.com with ESMTPSA id o13-20020a05622a008d00b002f937991969sm9926583qtw.24.2022.05.31.06.48.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 May 2022 06:48:57 -0700 (PDT) From: Olga Kornievskaia To: trond.myklebust@hammerspace.com, anna.schumaker@netapp.com Cc: linux-nfs@vger.kernel.org Subject: [PATCH] pNFS: fix IO thread starvation problem during LAYOUTUNAVAILABLE error Date: Tue, 31 May 2022 09:48:54 -0400 Message-Id: <20220531134854.63115-1-olga.kornievskaia@gmail.com> X-Mailer: git-send-email 2.30.1 (Apple Git-130) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org From: Olga Kornievskaia In recent pnfs testing we've incountered IO thread starvation problem during the time when the server returns LAYOUTUNAVAILABLE error to the client. When that happens each IO request tries to get a new layout and the pnfs_update_layout() code ensures that only 1 LAYOUTGET RPC is outstanding, the rest would be waiting. As the thread that gets the layout wakes up the waiters only one gets to run and it tends to be the latest added to the waiting queue. After receiving LAYOUTUNAVAILABLE error the client would fall back to the MDS writes and as those writes complete and the new write is issued, those requests are added as waiters and they get to run before the earliest of the waiters that was put on the queue originally never gets to run until the LAYOUTUNAVAILABLE condition resolves itself on the server. With the current code, if N IOs arrive asking for a layout, then there will be N serial LAYOUTGETs that will follow, each would be getting its own LAYOUTUNAVAILABLE error. Instead, the patch proposes to apply the error condition to ALL the waiters for the outstanding LAYOUTGET. Once the error is received, the code would allow all exiting N IOs fall back to the MDS, but any new arriving IOs would be then queued up and one them the new IO would trigger a new LAYOUTGET. Signed-off-by: Olga Kornievskaia --- fs/nfs/pnfs.c | 14 +++++++++++++- fs/nfs/pnfs.h | 2 ++ 2 files changed, 15 insertions(+), 1 deletion(-) diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c index 68a87be3e6f9..5b7a679e32c8 100644 --- a/fs/nfs/pnfs.c +++ b/fs/nfs/pnfs.c @@ -2028,10 +2028,20 @@ pnfs_update_layout(struct inode *ino, if ((list_empty(&lo->plh_segs) || !pnfs_layout_is_valid(lo)) && atomic_read(&lo->plh_outstanding) != 0) { spin_unlock(&ino->i_lock); + atomic_inc(&lo->plh_waiting); lseg = ERR_PTR(wait_var_event_killable(&lo->plh_outstanding, !atomic_read(&lo->plh_outstanding))); - if (IS_ERR(lseg)) + if (IS_ERR(lseg)) { + atomic_dec(&lo->plh_waiting); goto out_put_layout_hdr; + } + if (test_bit(NFS_LAYOUT_DRAIN, &lo->plh_flags)) { + pnfs_layout_clear_fail_bit(lo, pnfs_iomode_to_fail_bit(iomode)); + lseg = NULL; + if (atomic_dec_and_test(&lo->plh_waiting)) + clear_bit(NFS_LAYOUT_DRAIN, &lo->plh_flags); + goto out_put_layout_hdr; + } pnfs_put_layout_hdr(lo); goto lookup_again; } @@ -2152,6 +2162,8 @@ pnfs_update_layout(struct inode *ino, case -ERECALLCONFLICT: case -EAGAIN: break; + case -ENODATA: + set_bit(NFS_LAYOUT_DRAIN, &lo->plh_flags); default: if (!nfs_error_is_fatal(PTR_ERR(lseg))) { pnfs_layout_clear_fail_bit(lo, pnfs_iomode_to_fail_bit(iomode)); diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h index 07f11489e4e9..5c07da32320b 100644 --- a/fs/nfs/pnfs.h +++ b/fs/nfs/pnfs.h @@ -105,6 +105,7 @@ enum { NFS_LAYOUT_FIRST_LAYOUTGET, /* Serialize first layoutget */ NFS_LAYOUT_INODE_FREEING, /* The inode is being freed */ NFS_LAYOUT_HASHED, /* The layout visible */ + NFS_LAYOUT_DRAIN, }; enum layoutdriver_policy_flags { @@ -196,6 +197,7 @@ struct pnfs_commit_ops { struct pnfs_layout_hdr { refcount_t plh_refcount; atomic_t plh_outstanding; /* number of RPCs out */ + atomic_t plh_waiting; struct list_head plh_layouts; /* other client layouts */ struct list_head plh_bulk_destroy; struct list_head plh_segs; /* layout segments list */ -- 2.27.0