Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261191AbVDMUpS (ORCPT ); Wed, 13 Apr 2005 16:45:18 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261193AbVDMUpS (ORCPT ); Wed, 13 Apr 2005 16:45:18 -0400 Received: from waste.org ([216.27.176.166]:35046 "EHLO waste.org") by vger.kernel.org with ESMTP id S261191AbVDMUpL (ORCPT ); Wed, 13 Apr 2005 16:45:11 -0400 Date: Wed, 13 Apr 2005 13:44:51 -0700 From: Matt Mackall To: Linus Torvalds Cc: Andrea Arcangeli , David Eger , Petr Baudis , "Randy.Dunlap" , Ross Vandegrift , Kernel Mailing List Subject: Re: Re: more git updates.. Message-ID: <20050413204451.GP25554@waste.org> References: <20050412040519.GA17917@havoc.gtf.org> <20050412081613.GA18545@pasky.ji.cz> <20050412204429.GA24910@havoc.gtf.org> <20050412234005.GJ1521@opteron.random> <20050413001408.GL1521@opteron.random> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.6+20040907i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1890 Lines: 43 On Tue, Apr 12, 2005 at 06:10:27PM -0700, Linus Torvalds wrote: > > > On Wed, 13 Apr 2005, Andrea Arcangeli wrote: > > > > I wasn't suggesting to use CVS. I meant that for a newly developed SCM, > > the CVS/SCCS format as storage may be more appealing than the current > > git format. > > Go wild. I did mine in six days, and you've been whining about other > peoples SCM's for three years. I wrote a hack to do efficient delta storage with O(1) seeks for lookup and append last week, I believe it's been integrated into the latest Bazaar-NG. I expect it'll give better compression and performance than BK. Of course it ends up being O(revisions) for modifications or insertions (but that is probably a non-issue for the SCM models we're looking at). The git model is obviously very different, but I worry about the slop space implied. With 200k file revision and an average of 2k slop per file, that's 400MB of slop, or almost the size of an equivalent delta compressed kernel repo. Now if you can assume that blobs never change and are never deleted, you can simply append them all onto a log, and then index them with a separate file containing an htree of (sha1, offset, length) or the like. Since the key is already a strong hash, this is an excellent match and avoids rehashing in the kernel's directory lookup. And it'll save an inode, a directory entry, and about half a data block per entry. "Open" will also be cheaper as there's no per-revision inode to grab. I could hack on this if you think it fits with the git model, otherwise I'll go back to my other experiments.. -- Mathematics is the supreme nostalgia of our time. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/