From: Andreas Dilger Subject: Re: rfc: [patch] change attribute for ext3 Date: Thu, 23 Nov 2006 17:23:11 -0700 Message-ID: <20061124002311.GA32033@schatzie.adilger.int> References: <20060913164202.GA14838@openx1.frec.bull.fr> <1158171071.6072.10.camel@lade.trondhjem.org> <20060913183001.GA1702@moule.localdomain> <20060914092318.GA18911@schatzie.adilger.int> <20061114221725.GA14024@schatzie.adilger.int> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Cc: linux-ext4@vger.kernel.org, nfsv4@linux-nfs.org Return-path: To: Alexandre Ratchov Content-Disposition: inline In-Reply-To: <20061114221725.GA14024@schatzie.adilger.int> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: nfsv4-bounces@linux-nfs.org Errors-To: nfsv4-bounces@linux-nfs.org List-Id: linux-ext4.vger.kernel.org On Nov 14, 2006 15:17 -0700, Andreas Dilger wrote: > On Sep 13, 2006 20:30 +0200, Alexandre Ratchov wrote: > > On Wed, Sep 13, 2006 at 02:11:11PM -0400, Trond Myklebust wrote: > > > I would really have preferred a full-blown 64-bit counter as per > > > RFC3530, but I suppose we could always combine this change attribute > > > with the high word from ctime in order to make up the NFSv4 change > > > attribute. That should keep us safe until someone develops a ramdisk > > > with < 1 nsecond access time. > > > > do you mean something like "(ctime.tv_sec << 32) | change_attribute"? this > > would allow 2^32 inode changes per second. > > I've been giving this further thought, and it may be that a full 64-bit > counter per inode is the only bulletproof solution. > > One reason that ctime+nsec as the version number isn't so great is that if > there is some reason to set the clock backward (i.e. it was incorrectly > set into the future at some point) the inode ctime may jump backward. > This could cause either misordering of events, or collisions between > version numbers. The problem could be mitigated by having the ctime+nsec > value only increment the nsec component by 1 for each new version (like > a counter) until real time catches up with the bad ctime, but it might > leave files with a bad ctime for a long time. > > The main drawback of a 64-bit counter is the space in the inode that it > consumes... I don't think we can find 64 bits of free space in the core > inode, so this would relegate the solution to new filesystems that are > formatted with large inodes. Alexandre, Trond, what do you think about using a 32-bit in-inode version (sufficient for causal uses of NFSv4), and put the 32-bit MSB of the version into the large part of the inode (say after cr_time)? That allows use of the version for existing ext3 filesystems, and with large inodes (Lustre, ext4) it also meets the specs of RFC 3530 and any intended NFSv4 future use? Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc.