Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp1213840pxf; Fri, 19 Mar 2021 01:38:14 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzEhA3ANwa1uiYv/OnpKOB13v0kpj/SGeASKoHdn7oDcm+sj4xenAzWZCcNo7fFWKjTouX0 X-Received: by 2002:a17:906:8593:: with SMTP id v19mr3078537ejx.32.1616143094641; Fri, 19 Mar 2021 01:38:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616143094; cv=none; d=google.com; s=arc-20160816; b=Bh2rYr/w30+B3CLPVLBDqjQH1s/bdHwQlAZVG1chiJdX6RWAVVfB9zNPQ8UO7EFMY6 2O0QbXneN0oEIdlpumwAmktiwOzKc6ccUtxAqCn0SrDO1KI7N309XGbpuJpbvEvHw5ex pWkhoSjsbvtczMeBZv3LbtU9F9hmGmeGPJeQsCodrE/pt8an8omWzcJQzZNwUVYmwz0c qGWNmOepXbdNPK42RCKcat1QuurohgTDt1AqSF868pb0dLasPLjDKWnN8Z/WgArRd9Iw 1QQ8Vx7ghaGNcnv8AafaNyiGgI27ax99YEUocNhZ2E7YSrB9j6fBrLPxL4+lk1Vdvm0T w86Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=V4zhu0iRCwggJtth86gVHoOF9LsK67CiVV+65APy0lw=; b=wCZwNxoIdsZYEIK69ljJtJFd2MbfEQ9uecG9cQ1Jnsd7a3XjSZaZEKdUQbABq+f/wd EZ9FKOznblUmE51rN5Mfo8rdRoZaFsLmZUpKwkuulcCDOFqJ/QIo5fr9zlBELLi//DSG ewsR1nywdPDOsrWuhmbl+qAiM065wSMFjIt5dt/zZNHCPr79yjA/4A3yFA/H6e9oCsBU NhRMWwpROc4pQrcVNZD2+5d6tEZuO1QIv894tz3YUIe1aRA0dFVVUzmXXjaVTvcLhO+q SanI7f7eWAza5H9B61ocx287x/mTRlfVL6TGiSfTCKP6892jR7Oo7OxQAXhywStgOril 7w0Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=temperror (no key for signature) header.i=@szeredi.hu header.s=google header.b=NgCrMxdU; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r18si3492046eds.390.2021.03.19.01.37.46; Fri, 19 Mar 2021 01:38:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=temperror (no key for signature) header.i=@szeredi.hu header.s=google header.b=NgCrMxdU; spf=pass (google.com: domain of linux-ext4-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234290AbhCSIgy (ORCPT + 99 others); Fri, 19 Mar 2021 04:36:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35608 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234288AbhCSIgj (ORCPT ); Fri, 19 Mar 2021 04:36:39 -0400 Received: from mail-vk1-xa33.google.com (mail-vk1-xa33.google.com [IPv6:2607:f8b0:4864:20::a33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 74A8FC06174A for ; Fri, 19 Mar 2021 01:36:39 -0700 (PDT) Received: by mail-vk1-xa33.google.com with SMTP id f11so1905830vkl.9 for ; Fri, 19 Mar 2021 01:36:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=szeredi.hu; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=V4zhu0iRCwggJtth86gVHoOF9LsK67CiVV+65APy0lw=; b=NgCrMxdUv4Dd24ZJIvn8txNnJehnCB0Ef7rmPJEieE+/Pctto1uAw6wc8B+cAtlWPK VogFl9h1lhnO/pQRfQNdNah50HLP2xzpEuCnE3YgrDtMJwGvVcwuif9zFRL72kwm9DBt MWvTGTqzBo/n60C6tmFnyjn7KQ95cqhO2fEH4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=V4zhu0iRCwggJtth86gVHoOF9LsK67CiVV+65APy0lw=; b=GbaBYZ/iRaWjbK8NU2SAYuaydoPMGEPL84vw2xWsTdBrDmDiY6taHU81XM3vVRUa4I wH6MJIkC2xGjEFwm/DwqkgJOdoszxyVKmJGxnQAuyXwqmJPz4D10SXYcqdxIxLnDU7mB fujJkl2b3jcUAN+KsNLYzbAXsb/UVoWhDjEHmgbzh5k0ACg7VzuKwFaNyvaCmp93k8Sv 1oOFa1dUgSNH71qZp4r0hkw/K7YRzpVFpZlee2IdIJobSbQhaF+YrjhO1Q3ggUBkMUqP uPVqxbS0mgozRc84VRHttK7bkMBviZGuQgSKf+0dLzY152t81qIQr3lz9dOlaDiBcwos soUQ== X-Gm-Message-State: AOAM531uEo9rR9Bs1xPQscCenbdHrbaEkWa5ModmdegdvgxVZb7y1HMJ Uow3yEcAltOKIyRKrhF6+HTbZL9VZtYAAVwp3MM2PH1OI5clSA== X-Received: by 2002:a05:6122:54:: with SMTP id q20mr1666620vkn.3.1616142998580; Fri, 19 Mar 2021 01:36:38 -0700 (PDT) MIME-Version: 1.0 References: <20210316221921.1124955-1-harshadshirwadkar@gmail.com> In-Reply-To: From: Miklos Szeredi Date: Fri, 19 Mar 2021 09:36:27 +0100 Message-ID: Subject: Re: [PATCH] ext4: add rename whiteout support for fast commit To: Amir Goldstein Cc: harshad shirwadkar , Ext4 , Theodore Tso , overlayfs Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Fri, Mar 19, 2021 at 6:52 AM Amir Goldstein wrote: > > [adding overlayfs list] > > On Fri, Mar 19, 2021 at 3:32 AM harshad shirwadkar > wrote: > > > > Thanks for the review Amir. > > > > Sure changing the subject makes sense. > > > > Also, on further discussions on Ext4 conference call, we also thought > > that with this patch, overlayfs customers would not benefit from fast > > commits much if they call renames often. So, in order to really make > > rename whiteout a fast commit compatible operation, we probably would > > need to add support in fast commit to replay a char device creation > > event (since whiteout object is a char device). That would imply, we > > would need to do careful versioning and would need to burn an on-disk > > feature flag. > > > > An alternative to this would be to have a static whiteout object with > > irrelevant nlink count and to have every rename point to that object > > instead. Based on how we decide to implement that, at max only the > > first rename operation would be fast commit incompatible since that's > > when this object would get created. All the further operations would > > be fast commit compatible. The big benefit of this approach is that > > this way we don't have to add support for char device creation in fast > > commit replay code and thus we don't have to worry about versioning. > > > > I'm glad to hear that, Harshad. > > Please note that creating a static whiteout object on-disk is one possible > implementation option. Not creating any object on-disk may be even better. I don't really get it. What's the advantage of not having an object? Readdir returning DT_WHT internally might be nice, but I'd be careful with exporting that to userspace, since it's likely to cause more problems that it solves. And on the stat(2) interface adding S_IFWHT or even worse: ENOENT are really out of the question due to backward incompatibility with almost every application. > One other challenge is how to handle users trying to make operations > on the upper layer directly (migrating images etc). > As long as the tools still observe the whiteout as a chadev (with stat(2)) > then export and import should work fine (creating a real chardev on import). Right. Can't mkfs.ext4 just create the static object? That sounds to me like the simplest approach. Thanks, Miklos Thanks, Miklos > > I had suggested the static object approach because it should be pretty > simple to implement and add e2fsprogs support for. > > However, if we look at the requirements for RENAME_WHITEOUT, > the resulting directory entry does not actually need to point to any > object on-disk at all. > > An alternative implementation would be to create a directory entry > with {d_ino = 0, d_type = DT_WHT}. Lookup of this entry will return > a reference to a singleton read-only ext4 whiteout inode object, which > does not reside on disk, so fast commit is irrelevant in that sense. > i_nlink should be handled carefully, but that should be easier from > doing that for a static on-disk object. > > I am not sure how userland tools handle DT_WHT, but I see that > other filesystems can emit this value in theory and man rename(2) > claims that BSD uses DT_WHT, so the common tools should be > able to handle it. > > As far as overlayfs is concerned: > 1. ovl_lookup() will find an IS_WHITEOUT() inode as usual > 2. ovl_dir_read_merged() will need this small patch (below) and will > not access the inode object at all > 3. At first, ovl_whiteout() -> vfs_whiteout() can still create a usual chardev > 4. Later, we can initiate the overlayfs instance singleton whiteout > reference in ovl_check_rename_whiteout() and ovl_whiteout() will > never get -EMLINK when linking this whiteout object > > If there are tools that try to change inode permissions recursively on the > upper layer (?) there may be a problem with those read-only whiteouts > although the permission of a whiteout is a moot concept. > > Thanks, > Amir. > > --- a/fs/overlayfs/readdir.c > +++ b/fs/overlayfs/readdir.c > @@ -161,7 +161,7 @@ static struct ovl_cache_entry > *ovl_cache_entry_new(struct ovl_readdir_data *rdd, > if (ovl_calc_d_ino(rdd, p)) > p->ino = 0; > p->is_upper = rdd->is_upper; > - p->is_whiteout = false; > + p->is_whiteout = (d_type == DT_WHT); > > if (d_type == DT_CHR) { > p->next_maybe_whiteout = rdd->first_maybe_whiteout;