Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp2197028pxp; Mon, 21 Mar 2022 13:37:31 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxzQMWZGXMl7y0VslGhewLV/1ZyiJMJ3Xr/ZCKe/47ACSKzlCwjjyjWJebM9lrHsUTAikW8 X-Received: by 2002:a05:6402:5250:b0:419:2902:2833 with SMTP id t16-20020a056402525000b0041929022833mr11974200edd.153.1647895050846; Mon, 21 Mar 2022 13:37:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1647895050; cv=none; d=google.com; s=arc-20160816; b=PJB1A+ws9Ul+KVKjubJq+8ZV2l3LwLA2GLihBdIVes4+MOf1URUI/fSuEbiXwi5OiV Pnlx4hgIOb2m2YGLv0+kNRSOhvBTQuYmsbK8nVxBFJ34vi+KcSCri8ESJZuCKYps2Sd5 FlIlgSN1PvJKNQONJopZyRc27vTEzHTkb5n8jDYEAhOwGEHX4qiXKYK6ENu6lLshaAMg Gc5FE3bAQeN49UEuHGS1eHU1373cm5OqK6LD2IwedKFE/3h1TtC7zrfkhxLGZHm7qBi6 XgHzV2mzX9hqiIpSogeTgslSWn6X2oIajkH8UwYKfm7oIkBN95/+fh7jm0FU9De/NC+C oVdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature :dkim-signature; bh=m6kfUSHigNFp47MyL0GAHesqzazPfSGpZmi+Qk1a6W0=; b=Iu8WwACqLu5BOyjKhWPC7/3J8S4qXKVihAXEMfPrUZDdVnU5rb1o9v0Gf8KUdFdpkJ aFfaVH4ywlwFiSJprrE5Ows5T9H95ijHk5MT39eV8ngPjqaxv6StBrYlW6zf3LUUgab8 wzJda5Kp2Kw+UV8bT49Kv1qSgDyr7Gy/ePsmqHwjh/sN1qzZikivD+buyiygkJvJEtCQ LsEGQCrVhllh3d6c/37radoIRv5VLKGcKYI2ls684uVDV2OmHUtWHqlkOGCXjid/9gBA VluPRRe52GJr5XcsBllMvCUN9yY9ZNx/yNInL9pgjreAZKPIF7ANc+BxoKdO+20M/ZNq g3jw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=3K9Z+Awv; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519 header.b="/vcg1T9q"; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m9-20020a509309000000b00418c2b5bf01si9974824eda.483.2022.03.21.13.37.06; Mon, 21 Mar 2022 13:37:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=3K9Z+Awv; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519 header.b="/vcg1T9q"; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349157AbiCUOOx (ORCPT + 99 others); Mon, 21 Mar 2022 10:14:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38456 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351262AbiCUOKt (ORCPT ); Mon, 21 Mar 2022 10:10:49 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 480B316F061 for ; Mon, 21 Mar 2022 07:06:49 -0700 (PDT) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 0EDDB210E2; Mon, 21 Mar 2022 14:06:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1647871581; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=m6kfUSHigNFp47MyL0GAHesqzazPfSGpZmi+Qk1a6W0=; b=3K9Z+AwvKXQe63VIaavuILVrJYjnm6SBZlKvRHcbQz+bK7uCqP6l+BsqesoMfd+8oPhVa+ hsLmeRNiCsR5l0UnIpcQNGX4qETXWGv53nZTaOLkQgXctYlJOhWs8gPKYrc2I2DtBHIIo6 S0vFTF5lDTx/MpBFbJD02NUdYl2Hoxw= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1647871581; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=m6kfUSHigNFp47MyL0GAHesqzazPfSGpZmi+Qk1a6W0=; b=/vcg1T9qzuUMnzoF9bWNiayf/UjHKf3CNqU7tnka/U6Wx+6NFveKSt4kRDs56D/FSBL2Ro qxLW3qTG20BbheCw== Received: from quack3.suse.cz (unknown [10.100.224.230]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id F08D6A3B99; Mon, 21 Mar 2022 14:06:20 +0000 (UTC) Received: by quack3.suse.cz (Postfix, from userid 1000) id 9F429A0610; Mon, 21 Mar 2022 15:06:17 +0100 (CET) Date: Mon, 21 Mar 2022 15:06:17 +0100 From: Jan Kara To: harshad shirwadkar Cc: Jan Kara , Ext4 Developers List , Ritesh Harjani , "Theodore Y. Ts'o" Subject: Re: [PATCH v2 2/5] ext4: for committing inode, make ext4_fc_track_inode wait Message-ID: <20220321140617.wc2ohrorqny4leyc@quack3.lan> References: <20220308163319.1183625-1-harshads@google.com> <20220308163319.1183625-3-harshads@google.com> <20220309101426.qumxztpd4weqzrcs@quack3.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Sorry for delayed reply, I've got dragged into other things and somewhat forgot about this... On Fri 11-03-22 00:25:48, harshad shirwadkar wrote: > Hmm, after removing ext4_fc_track_inode() from ext4_journal_start(), I > see one deadlock - there are some places in code where > ext4_mark_inode_dirty gets called while holding i_data_sem. Commit > path requires i_data_sem to commit inode data (via ext4_map_blocks). > So, if an inode is being committed, ext4_mark_inode_dirty may start > waiting for the inode to commit while holding i_data_sem and the > commit path may wait on i_data_sem. Indeed, that is a problem. > The right way to fix this is to > call ext4_fc_track_inode in such places before acquiring i_data_sem in > write mode. But that would mean we sprinkle code with more > ext4_fc_track_inode() calls which is something that I preferably > wanted to avoid. So I agree calling ext4_fc_track_inode() from ext4_reserve_inode_write() isn't going to fly as is. We need to find a better way of doing things. > This makes me wonder though, for fast commits, is it a terrible idea > to extend the meaning of ext4_journal_start() from "start a new > handle" to "start a new handle with an intention to modify the passed > inode"? Most of the handles modify only one inode, and for other > places where we do modify multiple inodes, ext4_reserve_inode_write() > would ensure that those inodes are tracked as well. Well but to avoid deadlocks like you've described above you would have to start tracking inode with explicit ext4_fc_track_inode() calls before grabbing i_data_sem. ext4_reserve_inode_write() would be only useful for an assertion check that indeed the inode is already tracked. > All of the > existing places where inode gets modified after grabbing i_data_sem, > i_data_sem is grabbed only after starting the handle. This would take > care of the deadlock mentioned above and similar deadlocks. Another > advantage with doing this is that developers wouldn't need to worry > about adding explicit ext4_fc_track_inode() calls for future changes. Well, at least for the simple case where only one inode is modified. But I agree that's a majority. > If we decide to do this, we would need to do a thorough code review to > ensure that the above rule is followed everywhere. But note that > ext4_fc_track_inode() is idempotent, so it doesn't matter if this > function gets called multiple times in the same handle. So to avoid > breaking fast commits, we can be super careful and in the first > version, we can have ext4_fc_track_range() calls in > ext4_reserve_inode_dirty(), ext4_journal_start(), inline.c and in > handles where i_data_sem gets acquired in write mode. We can then > carefully evaluate each code path and remove redundant > ext4_fc_track_range() calls. As I wrote above, if we go this path, I'd be for ext4_fc_track_inode() calls in ext4_journal_start() and then adding explicit calls to ext4_fc_track_inode() where additionally needed and have only assertion checks in ext4_reserve_inode_dirty() and other places which modify inode metadata, to catch places which need explicit calls to ext4_fc_track_inode(). That way we won't have any hidden deadlocks in the code waiting to happen. I have also another proposal for consideration how we could handle this. Mostly branstorming now: We could also drop the need for fastcommit code to acquire i_data_sem during commit. We could use just information in extent status tree to provide block mapping for the fastcommit code (that does not need i_data_sem). The only thing is we'd need to make sure modified extents from the status tree are not evicted from memory before the appropriate transaction commits so that they are available for appropriate fastcommit - for that we'd probably need to add TID of the transaction that last modified an extent and add check into the shrinker to avoid shrinking uncommitted extents. As a bonus, we could now add to fastcommit only extents with appropriate TID and thus save on extent logging for sparsely modified inode. Honza -- Jan Kara SUSE Labs, CR