Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp6902670rdb; Fri, 15 Dec 2023 11:17:49 -0800 (PST) X-Google-Smtp-Source: AGHT+IER8RtaJNfhNPv5pohdgqUgFD7WfxpZHohPNQsNwFElzHlj6Wpr39vBVwv+g4wQJKecTySf X-Received: by 2002:a17:907:7203:b0:a1f:598:764f with SMTP id dr3-20020a170907720300b00a1f0598764fmr13607944ejc.59.1702667869551; Fri, 15 Dec 2023 11:17:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702667869; cv=none; d=google.com; s=arc-20160816; b=tGgxtIFPsCXLwYvP6sy7p3fUuGYg1lYLcD0yHsrzzwd83M0GcbdxYsnwzFO6uucvuJ GOY2GTetBV2jKOJfXv1ep4D9ny6IFLJkUKbYUd/niTOOsqB+iWve08vDCgtiVzwPlYD3 TBp35Gn6tLPKOD7ETt3wK0BdhqoI0T+VmZHROqO+z4v/3L+/wWWLp8DD6RxAnOUAPsvx UIyGZu9GM077kldE+FnDIJLsVS7Y7f0z3iYLUoU2mFXX9++xzMfFpXOaUkb5R/s9QFPj Zm+iPyQXLycTvgdSfcqDHSUv5K+UP9LFzaLIZ4+JZqiHi4LXS7hqbma56l5zhz9+3+dS Uj9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-id:precedence:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature; bh=b/H5JPUH+b32MlkB1jZGr68PwhnJY7JoOoPE47bH6io=; fh=9WVYkZZswu7tyxO2G/iFh5l6I+hfRv2+m7yU+1XOfCw=; b=O9JfYJ6QEmDRy4TT7mxOPQcDp96nowUs50EyDi0RVnitAA/U5j0f0//2c+0sFoKhFe /AbA3GXU6wwnGnFP5LEeI6OmXpmHjX809ko3YDJQEgIZBYAjBqARk41TiA5aWD7f6Uap yL409Q3jjCtFXqvJhmQWn4LWNrt+atMA/RyGjGwxJjbKoBSt9Lmethgxj9WpYxjbTFug XWHw+XPipI1TzAiCgOKKBmdYZ3yybyQ6bpRQN9PJhhQq5TRE3LhQePX8ujrr3PPP/7rd eQoeX1EakWtmHah+OpUhiDIKyWOn2quFb/FP1r6n0TMbBg4XQQuPmAAoeGH7O6i2enC/ HRlA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2023-11-20 header.b=HbEcpcOv; spf=pass (google.com: domain of linux-nfs+bounces-637-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-nfs+bounces-637-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id w17-20020a170906131100b00a1d12f5f0c5si7331431ejb.20.2023.12.15.11.17.49 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Dec 2023 11:17:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-nfs+bounces-637-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2023-11-20 header.b=HbEcpcOv; spf=pass (google.com: domain of linux-nfs+bounces-637-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-nfs+bounces-637-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 197151F24C5E for ; Fri, 15 Dec 2023 19:17:49 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 0F6333FE4B; Fri, 15 Dec 2023 19:17:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="HbEcpcOv" X-Original-To: linux-nfs@vger.kernel.org Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 23ED23FB34 for ; Fri, 15 Dec 2023 19:17:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Received: from pps.filterd (m0333521.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3BFI4IUl024130; Fri, 15 Dec 2023 19:15:26 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2023-11-20; bh=b/H5JPUH+b32MlkB1jZGr68PwhnJY7JoOoPE47bH6io=; b=HbEcpcOv+dPOElfdH5YIAAePqptUtsALRV75eADZP1cJ0oZEBZMeLqX+lvJ8Gq17m58H QR7vNQ34oTpibfmjCRx32uGlHYMvpncy+Mngb1MNEIc9ns1FW38uKYQgaIoruVhfaBvU I4BzHSKDx6AIDGz1W0lex8u4XYtPHwXpooZfDazXen6q5J95eXMNH0CM1GFIIQCzHzWd ezjXkkVZZ34K55TCp91CR7/abmzPaTjpv6fT0/XgKo5h7jdVxZJcYMZm7IkgWz4bUlYh jNWwQpXiYKFIytf5QWNO1lmZCLEod751Qrib5IHaaPd62Tg9VcWDLjFEvUeZEiYLzRNf nw== Received: from phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta03.appoci.oracle.com [138.1.37.129]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3uvf5ce5sq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 15 Dec 2023 19:15:25 +0000 Received: from pps.filterd (phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 3BFIHBI5017003; Fri, 15 Dec 2023 19:15:25 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3uvepcfqpr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 15 Dec 2023 19:15:25 +0000 Received: from phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3BFJEAbW013855; Fri, 15 Dec 2023 19:15:24 GMT Received: from ca-common-hq.us.oracle.com (ca-common-hq.us.oracle.com [10.211.9.209]) by phxpaimrmta03.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3uvepcfqmh-4; Fri, 15 Dec 2023 19:15:24 +0000 From: Dai Ngo To: chuck.lever@oracle.com, jlayton@kernel.org Cc: linux-nfs@vger.kernel.org, linux-nfs@stwm.de Subject: [PATCH 3/3] NFSD: Fix server reboot hang problem when callback workqueue is stuck Date: Fri, 15 Dec 2023 11:15:03 -0800 Message-Id: <1702667703-17978-4-git-send-email-dai.ngo@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1702667703-17978-1-git-send-email-dai.ngo@oracle.com> References: <1702667703-17978-1-git-send-email-dai.ngo@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-15_10,2023-12-14_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 bulkscore=0 spamscore=0 mlxscore=0 adultscore=0 phishscore=0 malwarescore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2312150135 X-Proofpoint-ORIG-GUID: YaRd3K1SZcdbetrtd8a31Zb_Hve2nSqT X-Proofpoint-GUID: YaRd3K1SZcdbetrtd8a31Zb_Hve2nSqT Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: If the callback workqueue is stuck, nfsd4_deleg_getattr_conflict will also stuck waiting for the callback request to be executed. This causes the client to hang waiting for the reply of the GETATTR and also causes the reboot of the NFS server to hang due to the pending NFS request. Fix by replacing wait_on_bit with wait_on_bit_timeout with 20 seconds time out. Reported-by: Wolfgang Walter Fixes: 6c41d9a9bd02 ("NFSD: handle GETATTR conflict with write delegation") Signed-off-by: Dai Ngo --- fs/nfsd/nfs4state.c | 6 +++++- fs/nfsd/state.h | 2 ++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 175f3e9f5822..0cc7d4953807 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -2948,6 +2948,9 @@ void nfs4_cb_getattr(struct nfs4_cb_fattr *ncf) if (test_and_set_bit(CB_GETATTR_BUSY, &ncf->ncf_cb_flags)) return; + /* set to proper status when nfsd4_cb_getattr_done runs */ + ncf->ncf_cb_status = NFS4ERR_IO; + refcount_inc(&dp->dl_stid.sc_count); if (!nfsd4_run_cb(&ncf->ncf_getattr)) { refcount_dec(&dp->dl_stid.sc_count); @@ -8558,7 +8561,8 @@ nfsd4_deleg_getattr_conflict(struct svc_rqst *rqstp, struct inode *inode, nfs4_cb_getattr(&dp->dl_cb_fattr); spin_unlock(&ctx->flc_lock); - wait_on_bit(&ncf->ncf_cb_flags, CB_GETATTR_BUSY, TASK_INTERRUPTIBLE); + wait_on_bit_timeout(&ncf->ncf_cb_flags, CB_GETATTR_BUSY, + TASK_INTERRUPTIBLE, NFSD_CB_GETATTR_TIMEOUT); if (ncf->ncf_cb_status) { status = nfserrno(nfsd_open_break_lease(inode, NFSD_MAY_READ)); if (status != nfserr_jukebox || diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h index f96eaa8e9413..94563a6813a6 100644 --- a/fs/nfsd/state.h +++ b/fs/nfsd/state.h @@ -135,6 +135,8 @@ struct nfs4_cb_fattr { /* bits for ncf_cb_flags */ #define CB_GETATTR_BUSY 0 +#define NFSD_CB_GETATTR_TIMEOUT msecs_to_jiffies(20000) /* 20 secs */ + /* * Represents a delegation stateid. The nfs4_client holds references to these * and they are put when it is being destroyed or when the delegation is -- 2.39.3