Received: by 2002:a05:6a10:16a7:0:0:0:0 with SMTP id gp39csp3875776pxb; Tue, 17 Nov 2020 06:00:52 -0800 (PST) X-Google-Smtp-Source: ABdhPJwUdWPaLHzeilZsIgVzYyEvGj1eYNPLUV6WElDK4igtD/l+TUsGq0CSirEDXVHPITAopicp X-Received: by 2002:a19:c354:: with SMTP id t81mr1680264lff.283.1605621652263; Tue, 17 Nov 2020 06:00:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1605621652; cv=none; d=google.com; s=arc-20160816; b=BTeOiKl3wTFPYdgNcmyYXnjF4hB4P/eQ7CAn7n4QJkIf5caTcC9zZlNLTTFSz5iGgZ rDg6Xetjd22ycLKdhhO0ovHIxfTZij37oKqwo08v815nE8JlM0buhirXdorwydg3zsuO 06+oW9BdnuO5dFYP6v/Gd5+HXoee5NeQiUT9Cj8dROmdiNVS+znNGDhTruJA9iSZnYhC hJlEPqQGt9nQtx2KFVGac7TRndu/VBk1iJBUF8LNCfPQ/Y1RKOdvfcAYttUyv/+XPsLu rVmcAgWynIIBbYllOztIHGucWkvGNe3KFO23mk7YZyGtvd9KzusZWy95bMNaDeG0b9Nl sH7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=QNE1J2h4LhKRJYC/WiJWSZwMabzMnoSVOx/rI/0XAC8=; b=QtwVnHiPZ8D+bXAIfbq7K7tkFZPQ5ah4sanOSRgM5JmECFYeyf5fk++3Jf7VteDgQq N9SvlmXcLe3iBoyiLN22yuGLxQM72URMPID/vTmYF6pVQTGcpvGBlnvpNdnFOI1a6c0C MqQY1LN+Lz1VxZNfNynNlDSjFUKl7eiz+5x/vxyL6f7WEC9Pgyvnq+AuyTbhkCLNLH/U JBIuWxJDJpplHGcKD1y4nz5/5ux7Q2V86HtemVT0aYeW0Hsxc4LYTnogLVdE8g8lVVdy jF0ea3yyA+ocIDnBX+llNbL5b+rbyRemBeIAcl1MqoD8LS35JNalUOs4SREY6JCa/XwI bQtQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="Y+tHc/ck"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id hs35si14134650ejc.312.2020.11.17.06.00.25; Tue, 17 Nov 2020 06:00:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b="Y+tHc/ck"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730586AbgKQNV5 (ORCPT + 99 others); Tue, 17 Nov 2020 08:21:57 -0500 Received: from mail.kernel.org ([198.145.29.99]:55170 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729547AbgKQNV3 (ORCPT ); Tue, 17 Nov 2020 08:21:29 -0500 Received: from localhost (83-86-74-64.cable.dynamic.v4.ziggo.nl [83.86.74.64]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 7F804246AA; Tue, 17 Nov 2020 13:21:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1605619289; bh=dbXJ0Zy+kmRpVXWfClIgbeagYJ/fXBJ+8uR2hhV8kV4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Y+tHc/ckMPTO5D35W0rO606ALIMTglIuV0d1fq9srhhkFTEnBI80NvJ4uez3eCG7I 7emfmlwA2iAr5Tc3tCejOyTsvZvT3+A2E83iqC4NhqijbS63bOpXJPLVlZ1TSEN6G0 fiBUpjVtTGDMJuEKVUBp9MUQZYt6RVGCWvpOjKd4= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Wengang Wang , Andrew Morton , Joseph Qi , Mark Fasheh , Joel Becker , Junxiao Bi , Changwei Ge , Gang He , Jun Piao , Linus Torvalds Subject: [PATCH 4.19 074/101] ocfs2: initialize ip_next_orphan Date: Tue, 17 Nov 2020 14:05:41 +0100 Message-Id: <20201117122116.721172863@linuxfoundation.org> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201117122113.128215851@linuxfoundation.org> References: <20201117122113.128215851@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Wengang Wang commit f5785283dd64867a711ca1fb1f5bb172f252ecdf upstream. Though problem if found on a lower 4.1.12 kernel, I think upstream has same issue. In one node in the cluster, there is the following callback trace: # cat /proc/21473/stack __ocfs2_cluster_lock.isra.36+0x336/0x9e0 [ocfs2] ocfs2_inode_lock_full_nested+0x121/0x520 [ocfs2] ocfs2_evict_inode+0x152/0x820 [ocfs2] evict+0xae/0x1a0 iput+0x1c6/0x230 ocfs2_orphan_filldir+0x5d/0x100 [ocfs2] ocfs2_dir_foreach_blk+0x490/0x4f0 [ocfs2] ocfs2_dir_foreach+0x29/0x30 [ocfs2] ocfs2_recover_orphans+0x1b6/0x9a0 [ocfs2] ocfs2_complete_recovery+0x1de/0x5c0 [ocfs2] process_one_work+0x169/0x4a0 worker_thread+0x5b/0x560 kthread+0xcb/0xf0 ret_from_fork+0x61/0x90 The above stack is not reasonable, the final iput shouldn't happen in ocfs2_orphan_filldir() function. Looking at the code, 2067 /* Skip inodes which are already added to recover list, since dio may 2068 * happen concurrently with unlink/rename */ 2069 if (OCFS2_I(iter)->ip_next_orphan) { 2070 iput(iter); 2071 return 0; 2072 } 2073 The logic thinks the inode is already in recover list on seeing ip_next_orphan is non-NULL, so it skip this inode after dropping a reference which incremented in ocfs2_iget(). While, if the inode is already in recover list, it should have another reference and the iput() at line 2070 should not be the final iput (dropping the last reference). So I don't think the inode is really in the recover list (no vmcore to confirm). Note that ocfs2_queue_orphans(), though not shown up in the call back trace, is holding cluster lock on the orphan directory when looking up for unlinked inodes. The on disk inode eviction could involve a lot of IOs which may need long time to finish. That means this node could hold the cluster lock for very long time, that can lead to the lock requests (from other nodes) to the orhpan directory hang for long time. Looking at more on ip_next_orphan, I found it's not initialized when allocating a new ocfs2_inode_info structure. This causes te reflink operations from some nodes hang for very long time waiting for the cluster lock on the orphan directory. Fix: initialize ip_next_orphan as NULL. Signed-off-by: Wengang Wang Signed-off-by: Andrew Morton Reviewed-by: Joseph Qi Cc: Mark Fasheh Cc: Joel Becker Cc: Junxiao Bi Cc: Changwei Ge Cc: Gang He Cc: Jun Piao Cc: Link: https://lkml.kernel.org/r/20201109171746.27884-1-wen.gang.wang@oracle.com Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/ocfs2/super.c | 1 + 1 file changed, 1 insertion(+) --- a/fs/ocfs2/super.c +++ b/fs/ocfs2/super.c @@ -1747,6 +1747,7 @@ static void ocfs2_inode_init_once(void * oi->ip_blkno = 0ULL; oi->ip_clusters = 0; + oi->ip_next_orphan = NULL; ocfs2_resv_init_once(&oi->ip_la_data_resv);