Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933735AbcK3PFi (ORCPT ); Wed, 30 Nov 2016 10:05:38 -0500 Received: from mail-wm0-f67.google.com ([74.125.82.67]:34361 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933648AbcK3PF2 (ORCPT ); Wed, 30 Nov 2016 10:05:28 -0500 MIME-Version: 1.0 In-Reply-To: References: <20161125212934.GB2622@veci.piliscsaba.szeredi.hu> From: Amir Goldstein Date: Wed, 30 Nov 2016 17:05:25 +0200 Message-ID: Subject: Re: [POC/RFC PATCH] overlayfs: constant inode numbers To: Miklos Szeredi Cc: "linux-unionfs@vger.kernel.org" , linux-fsdevel , linux-kernel Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2412 Lines: 65 On Tue, Nov 29, 2016 at 11:49 PM, Miklos Szeredi wrote: > On Tue, Nov 29, 2016 at 1:03 PM, Amir Goldstein wrote: ... > I meant that we can unify OVL_XATTR_INO with "redirect/fh" > functionality and get something good out of it. > >> Perhaps you meant for non-dir: >> >> 5. If redirect_dir=fh, *propagate* lowest-handle on non-dir copy up >> 6. In ovl_lookup() of non-dir, decode lowest-handle to set oe->ino > > Yes. > > OVL_XATTR_FH would be safe to ignore, so this is back and forward > compatible.. And the cost is probably not prohitive, since copy ups > should be relatively rare. > > After a backup + restore it is not expected that we get back the old > inode numbers so it's fine to ignore the stale file handles. > FYI, there are 2 interesting corner case of "semi stale" handles: - Copy of layers to same fs (without deleting old layers) - Old layers are deleted but an old deleted file is still open I have handled both these cases in the last version of redirect_fh that I pushed yesterday, but not 100% sure that I handled them correctly. Anyway, I will get to work on adjusting redirect_fh for use by stable inodes. > The following issues are left: > > - performance of readdir; Here is one very simple optimization for WIP: @@ -157,6 +157,8 @@ static int ovl_fill_lowest(struct ovl_readdir_data *rdd, list_move_tail(&p->l_node, &rdd->middle); } else { p = ovl_cache_entry_new(rdd, name, namelen, ino, d_type); + if (p) + p->ino = ino; For non-lowets entry, we can provide mount option 'readdir_ino'. With readdir_ino, readdir pays a penalty of getxattr for any non-lowest entry (either OVL_XATTR_FH or OVL_XATTR_INO). Without readdir_ino, readdir will get d_ino = 0, in which case, at least `find --inum=` does the right thing (fallback to fstat for this dirent). > - what to do if not all layers are on the same fs; Same as what I did for redirect_fh - turn the feature off. We can also export this state in /proc/mounts options and maybe allow to explicitly turn off stable inodes, but I don't think that we should, because there shouldn't be a program which relies on inode numbers NOT being stable. > - hard link copy ups. > I'll start by setting up a TODO Wiki page and writing xfstests for all those issues. Maybe even track them on github.. Amir.