Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp1136180pxf; Thu, 18 Mar 2021 22:53:34 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxo0KGBMr7Kfsr+oDANQ7ip12nvQI4hjhOrNpk53bYHET7fi8nc+pAc1qfn0y7Ejdp4DeV9 X-Received: by 2002:aa7:cdcf:: with SMTP id h15mr7584805edw.28.1616133214707; Thu, 18 Mar 2021 22:53:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616133214; cv=none; d=google.com; s=arc-20160816; b=Zm+yiAZGx1y+Q53PA2T1ftl2to+RHzjNj+FtlOe/C3UDIkB8P1U/BLoZKqhgbKA+L8 oiey0gocExAh9vG5SXS3SkhAmLulT7ETruDasAUmucCqe9A1niPTIdYAZ3/67NDju5Di uTkFLY96ELEf4kG4SrY5+p9kyUS3gEotisAVLDGYIzEjOKmjwbbWwTPqIA0mo/z+y4Ij ui8ratIaw3i02bs6XAUaFQzC69dZGm5vPKit8f+oAPk6M9/6apisfigTbcAwOjwbQztn rkDXq1Ak8WXRvBEHeNtABqpaZl/WTPB93os+TBViG5LzgkIoHi1ezMGl9CFDD6Y0OxXP CR0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=rFEUCsXB/846GPTkksZ0MN9Gughl9FrFefeL2AgymC8=; b=ulurhsZfh/on8gnHZT6bWPZXS2s7BfBoNsZ+ZvOIiulcmMWgMFgw/UEq7XA7Roy+km w7U0ALEJQvNeNTPddGUixNQ8G5s9rJJiA1R9vgQWuimr/+YsQJHQi7/53HrLnGhhh6cq cFFnFMojmS6JJAe1D7m2X9H9QtLV1I5GFHz9ZLVzjwSLt5wEr8VyMJGpZuJ1F+2tVPDq kpBEXQUGK/c5X501L7z8ANDL1yESb+8g1VDIhxbYNYKafbO3jLsXnjs37y+woKfkk2mj bNBUqC8lvv7iYQRuBsmArVwxyQliK1qOJKFXG7gRhrc5pzC23X+WLT3PEcw+xQJBl/Iu FrqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=jUqDRjzz; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id c11si3317808edw.146.2021.03.18.22.53.07; Thu, 18 Mar 2021 22:53:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=jUqDRjzz; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233917AbhCSFwf (ORCPT + 99 others); Fri, 19 Mar 2021 01:52:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56700 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233756AbhCSFwG (ORCPT ); Fri, 19 Mar 2021 01:52:06 -0400 Received: from mail-io1-xd2e.google.com (mail-io1-xd2e.google.com [IPv6:2607:f8b0:4864:20::d2e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 26A76C06174A; Thu, 18 Mar 2021 22:52:06 -0700 (PDT) Received: by mail-io1-xd2e.google.com with SMTP id v3so4872788ioq.2; Thu, 18 Mar 2021 22:52:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=rFEUCsXB/846GPTkksZ0MN9Gughl9FrFefeL2AgymC8=; b=jUqDRjzzBert2lOnmK3GKqrOzOpVX/FrIXwQhOgJ3CC9OgXZF8QdYhKCCrcMog46jE zr6dq2oqQgnHmVQbphZJQXkUcmWzBFc46vxdm+Igus4KeeGWCINLHMoL1W2wH5Oabxy6 zOZo7TSLXRFJRW12Hq7uIFISsAapq2tWguVZojl1RWwxa84p8Fy6m6pTp8HVJv7u9fhD PxZiAhdcaB7EXNKQpTRemg7skK9TLZ9f+ZklgtWZvzOjCd6H3i1kPKwTWHvuzv8EO2mL WnGv8kzXMPxrjRI+KCdgDkq0trHm5gqA7r9hh79HUD7FH/AjLIHGDKZ4q9n0SoBsKZrN 6cow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=rFEUCsXB/846GPTkksZ0MN9Gughl9FrFefeL2AgymC8=; b=tVJGTN0fakX26q5Gt3wrt683rwCg0Oz4uoM/EngezJ9tnDDlaQhlJaLxqCPqCGYPg8 2bgcn/gwEt5ez5KPjxDdtZxHf4jVuq4nJPVAbIxoHcgq1TsMW0gVMYggrMYI3mFYEPUW C153WmiuQGt3wxLc1EZRuIbqGdm7Vr1JXV+mtw7RYwGn90YuQFO2si1RAFE5WqyTt2LL zSHuibF3xRz0XiCs9OndqV3iV+sgq2gkA4t0kLMWxZTcgA+YRghIey2W7LldgqXqiYSo RS2J95x8Y+2GKUPZxoE1Wbw8CB5Ya+53Jdx7m8aOyy8tgPlMb20D7EbppugxXO+6Zba4 09mw== X-Gm-Message-State: AOAM532lHgtGXj4/rKyxyKlv+kdjyae8uDc7aranFs+pUsK980NbjYwh hsH6+2MyXiCjHONfbflzc0x81tuo/TVEwTga2k20aHZgWMo= X-Received: by 2002:a6b:4411:: with SMTP id r17mr1520221ioa.64.1616133125575; Thu, 18 Mar 2021 22:52:05 -0700 (PDT) MIME-Version: 1.0 References: <20210316221921.1124955-1-harshadshirwadkar@gmail.com> In-Reply-To: From: Amir Goldstein Date: Fri, 19 Mar 2021 07:51:54 +0200 Message-ID: Subject: Re: [PATCH] ext4: add rename whiteout support for fast commit To: harshad shirwadkar Cc: Ext4 , Theodore Tso , Miklos Szeredi , overlayfs Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org [adding overlayfs list] On Fri, Mar 19, 2021 at 3:32 AM harshad shirwadkar wrote: > > Thanks for the review Amir. > > Sure changing the subject makes sense. > > Also, on further discussions on Ext4 conference call, we also thought > that with this patch, overlayfs customers would not benefit from fast > commits much if they call renames often. So, in order to really make > rename whiteout a fast commit compatible operation, we probably would > need to add support in fast commit to replay a char device creation > event (since whiteout object is a char device). That would imply, we > would need to do careful versioning and would need to burn an on-disk > feature flag. > > An alternative to this would be to have a static whiteout object with > irrelevant nlink count and to have every rename point to that object > instead. Based on how we decide to implement that, at max only the > first rename operation would be fast commit incompatible since that's > when this object would get created. All the further operations would > be fast commit compatible. The big benefit of this approach is that > this way we don't have to add support for char device creation in fast > commit replay code and thus we don't have to worry about versioning. > I'm glad to hear that, Harshad. Please note that creating a static whiteout object on-disk is one possible implementation option. Not creating any object on-disk may be even better. I had suggested the static object approach because it should be pretty simple to implement and add e2fsprogs support for. However, if we look at the requirements for RENAME_WHITEOUT, the resulting directory entry does not actually need to point to any object on-disk at all. An alternative implementation would be to create a directory entry with {d_ino = 0, d_type = DT_WHT}. Lookup of this entry will return a reference to a singleton read-only ext4 whiteout inode object, which does not reside on disk, so fast commit is irrelevant in that sense. i_nlink should be handled carefully, but that should be easier from doing that for a static on-disk object. I am not sure how userland tools handle DT_WHT, but I see that other filesystems can emit this value in theory and man rename(2) claims that BSD uses DT_WHT, so the common tools should be able to handle it. As far as overlayfs is concerned: 1. ovl_lookup() will find an IS_WHITEOUT() inode as usual 2. ovl_dir_read_merged() will need this small patch (below) and will not access the inode object at all 3. At first, ovl_whiteout() -> vfs_whiteout() can still create a usual chardev 4. Later, we can initiate the overlayfs instance singleton whiteout reference in ovl_check_rename_whiteout() and ovl_whiteout() will never get -EMLINK when linking this whiteout object One other challenge is how to handle users trying to make operations on the upper layer directly (migrating images etc). As long as the tools still observe the whiteout as a chadev (with stat(2)) then export and import should work fine (creating a real chardev on import). If there are tools that try to change inode permissions recursively on the upper layer (?) there may be a problem with those read-only whiteouts although the permission of a whiteout is a moot concept. Thanks, Amir. --- a/fs/overlayfs/readdir.c +++ b/fs/overlayfs/readdir.c @@ -161,7 +161,7 @@ static struct ovl_cache_entry *ovl_cache_entry_new(struct ovl_readdir_data *rdd, if (ovl_calc_d_ino(rdd, p)) p->ino = 0; p->is_upper = rdd->is_upper; - p->is_whiteout = false; + p->is_whiteout = (d_type == DT_WHT); if (d_type == DT_CHR) { p->next_maybe_whiteout = rdd->first_maybe_whiteout;