Received: by 2002:a05:6a10:144:0:0:0:0 with SMTP id 4csp21465pxw; Thu, 7 Apr 2022 23:16:18 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxgG8+Z8yTGYl9oDGO0CnnAr5QqX/c+LmTsTPSZfvomtAVwNrK3WEcr28sVkpt4mkzHCQbv X-Received: by 2002:a17:90b:1c87:b0:1ca:f4e:4fbe with SMTP id oo7-20020a17090b1c8700b001ca0f4e4fbemr19980485pjb.159.1649398578752; Thu, 07 Apr 2022 23:16:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649398578; cv=none; d=google.com; s=arc-20160816; b=FogT817n0CRn9KxrYRiH3sUNya7gWO+RK8Wuwm4ayw1k/a78tlQWq9TvB8Ij0uT7Tc CuI0aARDSpAlxtBX8vzD7QQk2DHN9DqOmVdUryXstmRjcSUvfOlaGTlpBsWFls9TgFYJ r92T0t0SvlYIQCJsBEHQ9i+3aGakgS3ItZQPaSw7ezH+FV7rGEj5UkRyovaTTOUW38J5 4+Qxrxn+fhkYB1BzWW2ztkwjzpxttrRv3wpCBcHD3FY2Wcdl6qeIxm3TDCqyUvdA8Inp m6vNQ/0AO33RbKLQrPLPsMp29ApLReyVObtOZYT4JJYkNXvj0bNFcl5+ikzB4eApWIpy 3UyQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=9qBsaly8XN9FstrxZ1RWay7sbDUUu/+tVNlrk7Uzb4Q=; b=KNdYbw1X9lsqkmN+Qfc0l7Z/W3mk/YBJL39NPDGZm9Ts1FBRQviQgxct0bo7yj43ro Cbem/sQMezi4VCGLvueoY+sScTDjbxARx+z7ahePj+KIQfca4LvU+Pj73/xb4kawgau/ Q+IvjWP/pJ/RF1uJghXivHZ+Upk71+RRyMATFSQGaYM3zSDBvrqmTd1WaQYm27j3kPSC 3Iqffs+VVKF9AohtpZGGu1v8nLWUhnfpxNk7U81Rp+osv8owQaw5QuwiaN/VzxUVQvjb 80ZNBX5mIsbkLSZznFUHfPYLkD71XPQxvgbHCp9CAvwxxeELeiaQKkyPjKDfjOZJMF5F Cr3A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id hk14-20020a17090b224e00b001caa0f5202asi498573pjb.44.2022.04.07.23.16.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Apr 2022 23:16:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 9EC183BA56; Thu, 7 Apr 2022 22:46:06 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234938AbiDHFsG (ORCPT + 99 others); Fri, 8 Apr 2022 01:48:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59868 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235005AbiDHFrb (ORCPT ); Fri, 8 Apr 2022 01:47:31 -0400 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C738016F6DF for ; Thu, 7 Apr 2022 22:45:27 -0700 (PDT) Received: from canpemm500005.china.huawei.com (unknown [172.30.72.53]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4KZRzD3pbxz1HBdS; Fri, 8 Apr 2022 13:44:56 +0800 (CST) Received: from [10.174.178.134] (10.174.178.134) by canpemm500005.china.huawei.com (7.192.104.229) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Fri, 8 Apr 2022 13:45:25 +0800 Subject: Re: [RFC PATCH] ext4: convert symlink external data block mapping to bdev To: Jan Kara CC: , , , , References: <20220406084503.1961686-1-yi.zhang@huawei.com> <20220406171715.35euuzocoe4ljepe@quack3.lan> <806b63ff-975d-123d-5925-587aa026ce94@huawei.com> <20220407135541.i765244kahb6lgqz@quack3.lan> From: Zhang Yi Message-ID: <515c6bbe-ae01-0256-1ae2-128dd0620fb3@huawei.com> Date: Fri, 8 Apr 2022 13:45:24 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 In-Reply-To: <20220407135541.i765244kahb6lgqz@quack3.lan> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.178.134] X-ClientProxiedBy: dggems702-chm.china.huawei.com (10.3.19.179) To canpemm500005.china.huawei.com (7.192.104.229) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.7 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On 2022/4/7 21:55, Jan Kara wrote: > On Thu 07-04-22 16:14:24, Zhang Yi wrote: >> On 2022/4/7 1:17, Jan Kara wrote: >>> On Wed 06-04-22 16:45:03, Zhang Yi wrote: >>>> Symlink's external data block is one kind of metadata block, and now >>>> that almost all ext4 metadata block's page cache (e.g. directory blocks, >>>> quota blocks...) belongs to bdev backing inode except the symlink. It >>>> is essentially worked in data=journal mode like other regular file's >>>> data block because probably in order to make it simple for generic VFS >>>> code handling symlinks or some other historical reasons, but the logic >>>> of creating external data block in ext4_symlink() is complicated. and it >>>> also make things confused if user do not want to let the filesystem >>>> worked in data=journal mode. This patch convert the final exceptional >>>> case and make things clean, move the mapping of the symlink's external >>>> data block to bdev like any other metadata block does. >>>> >>>> Signed-off-by: Zhang Yi >>>> --- >>>> This RFC patch follow the talking of whether if we could unify the >>>> journal mode of ext4 metadata blocks[1], it stop using the data=journal >>>> mode for the final exception case of symlink's external data block. Any >>>> comments are welcome, thanks. >>>> >>>> [1]. https://lore.kernel.org/linux-ext4/20220321151141.hypnhr6o4vng2sa6@quack3.lan/T/#m84b942a6bb838ba60ae8afd906ebbb987a577488 >>>> >>>> fs/ext4/inode.c | 9 +--- >>>> fs/ext4/namei.c | 123 +++++++++++++++++++++------------------------- >>>> fs/ext4/symlink.c | 44 ++++++++++++++--- >>>> 3 files changed, 93 insertions(+), 83 deletions(-) >>> >>> Hum, we don't save on code but I'd say the result is somewhat more >>> standard. So I guess this makes some sense. Let's see what Ted thinks... >>> >>> Otherwise I've found just one small bug below. >>> >>>> @@ -3270,26 +3296,8 @@ static int ext4_symlink(struct user_namespace *mnt_userns, struct inode *dir, >>>> if (err) >>>> return err; >>>> >>>> - if ((disk_link.len > EXT4_N_BLOCKS * 4)) { >>>> - /* >>>> - * For non-fast symlinks, we just allocate inode and put it on >>>> - * orphan list in the first transaction => we need bitmap, >>>> - * group descriptor, sb, inode block, quota blocks, and >>>> - * possibly selinux xattr blocks. >>>> - */ >>>> - credits = 4 + EXT4_MAXQUOTAS_INIT_BLOCKS(dir->i_sb) + >>>> - EXT4_XATTR_TRANS_BLOCKS; >>>> - } else { >>>> - /* >>>> - * Fast symlink. We have to add entry to directory >>>> - * (EXT4_DATA_TRANS_BLOCKS + EXT4_INDEX_EXTRA_TRANS_BLOCKS), >>>> - * allocate new inode (bitmap, group descriptor, inode block, >>>> - * quota blocks, sb is already counted in previous macros). >>>> - */ >>>> - credits = EXT4_DATA_TRANS_BLOCKS(dir->i_sb) + >>>> - EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3; >>>> - } >>>> - >>>> + credits = EXT4_DATA_TRANS_BLOCKS(dir->i_sb) + >>>> + EXT4_INDEX_EXTRA_TRANS_BLOCKS + 3; >>> >>> This does not seem like enough credits - we may need to allocate inode, add >>> entry to directory, allocate & initialize symlink block. So I think you >>> need to add 4 for block allocation + init in case of non-fast symlink. And >>> please keep the comment explaining what is actually counted in the number >>> of credits... >>> >> >> Thanks for pointing this out, and ext4_mkdir() seems has the same problem >> too because we also need to allocate one more block to store '.' and '..' >> entries for a new created empty directory. > > OK, I was thinking a bit more about this and the comment was actually a bit > misleading AFAICT. So we have: > > EXT4_INDEX_EXTRA_TRANS_BLOCKS for addition of entry into the directory. > +3 for inode, inode bitmap, group descriptor allocation > EXT4_DATA_TRANS_BLOCKS for the data block allocation and modification. > > So things actually look OK, just the comment was wrong and in the old code > the credits were overestimated (because we've allocated the data block in a > separate transaction). > Yes,I will update the comments in my v2 iteration. >> BTW, look the credits calculation in depth, the definition of >> EXT4_DATA_TRANS_BLOCKS is weird, the '-2' subtraction looks wrong. >> >>> #define EXT4_DATA_TRANS_BLOCKS(sb) (EXT4_SINGLEDATA_TRANS_BLOCKS(sb) + \ >>> EXT4_XATTR_TRANS_BLOCKS - 2 + \ >>> EXT4_MAXQUOTAS_TRANS_BLOCKS(sb)) >> >> I see the history log, before commit[1], the '-2' subtract the 2 more duplicate >> counted super block in '3 * EXT3_SINGLEDATA_TRANS_BLOCKS', but after this commit, >> it seems buggy because we have only count the super block once. It's a long time >> ago, I'm not sure am I missing something? > > Yes, -2 looks strange but at the same time I fail to see why > EXT4_XATTR_TRANS_BLOCKS would need to be accounted for normal data > operation and why we're reserving 6 blocks there. I'll raise it on today's > ext4 call if someone remembers... > I guess the 6 blocks were: 1. Ref count update on old xattr block 2. new xattr block 3. block bitmap update for new xattr block 4. group descriptor for new xattr block 5. block bitmap update for old xattr block 6. group descriptor for old block The 5 and 6 looks like were overestimated in cases of 1) we just update the old ref count to no zero, 2) we free the old xattr block and the credits has already counted in the default revoke credits. Thanks, Yi. >> [1]. https://git.kernel.org/pub/scm/linux/kernel/git/history/history.git/commit/?id=2df2c24aa6d2cd56777570d96494b921568b4405